Missing Dates, Time Series, Linear Interpolation

In a 13 year series of data there are several gaps in monthly data. With help, I was able to use Linear Interpolation in the Missing Value node for numeric data like “sold price”, etc., and with some recent help (Date&Time to SerialNumber – KNIME Community Hub and Sporadic Missing Dates In Time Series Data - #2 by takbb ) was able to solve the missing dates via Linear Interpolation.

I’m just trying to fix a bit of a glitch where the LI produces the same date several times in a row. Admittedly, LI repeats the numeric values such as close price when there’s no solid beginning or end known data in the series, but the price repetition is okay for a time series presentation. Whereas, repeating dates does not work out as well when presenting time series data.

Here’s some snap shots of the output. Perhaps you have a know method for dealing with time series glitches and the shortcoming of Linear Interpolation.

The missing date LI system linked above works great where I have know dates at the beginning and end of the data gaps (below):


Where date interpolation repeats is when I’m missing a known beginning or end value (below). And don’t worry about two different neighborhoods named in this table. All the calculations occur in a Group Loop series per each neighborhood. This table is combined after the Linear Interpolation.


Any suggestions on correcting these repeating dates is very appreciated.

So, this is just a bit of a thought…You’ll see in the columns “Year”, and “Month”. I forced those via a dictionary system to expose the various missing data (13 years, 100 different neighborhoods, and all the months). I then ran a bunch of processes to fill in a lot of missing numeric data with various Rule Engine and Missing Value calculations. Maybe there’s a way to use my “forced” year and month to create standard required beginning and end dates–like the 15th of any missing month? I’d need to automate that somehow, as the full series is updated with new data each month, and I don’t want to screw up the current LI system that solves the problem well 90% of the time. Just a thought.

Hi @creedssmith, I can see that you need to set a missing first and last row date to some arbitrary value into for the linear interpolation not to simply repeat the first or last available date since it has no guide to how far backwards or forwards to go. So yes, a process for setting the date on first and last row (if missing) would be needed.

Similar to your idea of “15th of any missing month”, I have a component that can provide the first or last day of previous or next month that may be of use as a starting point.

My proposal then is to find the first date in the table, and then if there is a missing date on row 1, populate it with the first date of the month prior to the current “first date” found.

Likewise if the date is missing on the final row, populate it with the last date of the month following the current “last date” found.

This additional component can be found here:

This would then be included in some pre-processing prior to the existing interpolation:

The pitfall I see would be where you had an exceptionally large number of missing dates at the beginning or end of your table, such that the number of rows to be filled in by interpolation exceeds the number of days between, say, the first of the previous month and the current “first date”. I’ll assume that the gaps are not that vast!

If you wanted to adjust this to being, for example, middle of the month, you could always use Date&Time Shift node to add (or subtract) 14 or 15 days from these new dates.

Here is an update to the workflow from your previous thread, with this additional processing.

Fix missing dates - with missing first and last date.knwf (300.5 KB)

1 Like

Hi @takbb It seems I need all of your tools slight_smile: . I’m glad you created them.

I’m right now cautiously integrating and testing your latest nodes and workflow into my data. My stuff is a bit more complex that two columns, so I’ll let you know when I get it all up and running.

Thanks so much for your help.

1 Like

Hi @creedssmith,

The components are all using core nodes, and in this case, you could perform the same operations with String Manipulation, using a “java hack” trick. This is actually what the component uses, and I packaged it up as a convenience rather than having to remember (or copy/paste/edit) the code each time:

The code in this case is as follows:
first of previous month

string(
java.time.LocalDate.parse(string($Min*(column1)$), 
	java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd")).minusMonths(1).withDayOfMonth(1)
	)

last of next month

string(
java.time.LocalDate.parse($Max*(column1)$, 
	java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd")).plusMonths(1).withDayOfMonth(
		java.time.LocalDate.parse($Max*(column1)$, 
	java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd")).plusMonths(1).lengthOfMonth())
	)

But that java code is a bit of an ask to remember, so… component :wink:

There are of course other ways in KNIME to achieve the result, by calculating timestamps, or other manipulations, and this is just my preferred option.

btw, taking these String Manipulations, they could easily be modified to return the 15th of both months, by replacing the

.withDayOfMonth(1)

and

.withDayOfMonth(
		java.time.LocalDate.parse($Max*(column1)$, 
	java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd")).plusMonths(1).lengthOfMonth())

with

.withDayOfMonth(15)

1 Like

Hi @takbb Oh no, I am perfectly fine using your nodes :smile:

Looking at this tiny portion of my WF (it’s a mess as I jammed your WF into my own and have not cleaned it up visually yet), I am getting a notice (pasted below) that seems to be stopping my Group Loop function (by each of 100 neighborhoods). Any guess what I need to fix here? The whole loop seems to run fine until this error hits.

Error:
$$ROWINDEX$$ = 0 AND MISSING $column1$ => $${SMin*(column1) (first_of_prev)}$$

Hi @creedssmith , is that the entire error message? It appears to be simply displaying one of the rules from the Rule Engine, but it doesn’t give any clues about what the issue is.

Is there actually a node that is visually showing an error indicator (e.g. a red circle with a cross) ?

Can you give a wider view of your workflow showing the Loop Start and the Rule Engine?

Better still, would you be able to upload the workflow with a small (and sanitized) data set?