Extrapolate missing values smartly - question for help

Dear Forum!

Thanks in advance for reading and maybe even solving my question that I could not resolve on my own and did not find a (similar) solution in the forum (yet…).

Here we go: I do have a data set with a data quality issue. I do have items (A, B, C, …) over time (every month) with some attributes (a,b,c, d, …). Correctly this attribute exists for every month the item exists. In reality people enter the attributes in the first month and forget sometimes in following ones. In addition, the item may start to exist in one month, keeps existing for some months and may disappear and potentially even reappear in months further down the line, often then with different attributes.

I tried to provide a simplified version of the real data in the attached workflow. I tried “guessing” the missing attributes from previous months with lag columns within two loops (one on the item, the inner one on the year-month) but failed…

Anyone wanna give it a try and set me on the right track? Thanks a lot! :slight_smile:
Knime - smart extrapolation of missing data.knwf (5.3 KB)

One idea could be to employ a program to impute the missing values. Some time ago I set up an example using the R package Amelia.

Please also note the additional links in the example. It is possible that in the meantime there are more advanced methods available. Also success might very much depend on how regular your missing numbers are missing and if a pattern can be established.

Thanks, @mlauber71, for that approach. However, there is a certainly logic to fill the missings in my case. As the attributes are always available when an item appears the first time, I can base the algorithm on that fact and make it more a deterministic thing rather than guessing.

Something like this (in a kind of pseud-code):
for each item
for each year-month
if missing(attribute(item, current year-month)) AND not missing (item (current year-month)
then attribute (item, current year-month) = attribute (item, previous year-month)
end
end

While typing I guess what I am missing is how to set this up in KNIME and esp. how to get the attribute-value of the previous month.

Thanks everyone!

You can use the

To shift the date back a month a year or a certain time. I would suggest you make yourself a table with the values from the time before and join that back to your current values.

Then you can use the Rule Engine to replace the missing values

1 Like

Was not aware of the date&time shift node - I’ll give it a try and report back here :slight_smile: Thx.

1 Like

Hello @jsbitz,

and what should be output from your input? Possibly a Missing Value node inside Group Loop Start could do the trick.

Br,
Ivan

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.