Missing Value Node interpolation -- Suggestion

Hi,
It would be great if there was a “group by” feature within the Missing Value node. It appears if you are replacing missing values through a method such as moving average, the moving average looks up and down the column to create new new value… This is fine, if the missing values are in the middle of the group. However, if the missing values are at the beginning or end of the group’s data set, the moving average will create a value with the previous groups information.

Example:
image

As you can see, the missing value in ID A is created appropriately, taking 1 value back, and 1 value forward. However, ID B, since the first value is missing, takes 1 value back, which is actually a value for A. It is inappropriate to do this. Instead, I think it would be better if there was logic in place to start over and find the first non-missing value (which may actually be 2 forward values, but it would be better than combining two data sets that should not touch…). The same issue is present if the last value is missing… 1 backward value and 1 forward value means B & C’s data is being combined.

Of course, I can get around this by using a group loop with ID’s being the grouping principal… However, it just takes so long! Looping over a dataset of store sales with 1000’s of stores takes considerable amount of time.

I hope some additional logic can be placed into the Missing Value node to make it more robust and avoid the need for a loop.

Thanks,
Snowy

1 Like

Hi @snowy, for a totally different reason I recently wrote a component that marks via two new appended columns containing booleans the first and last row in a group (specified by a value on another single column). I wonder if having found the missing values in the separate column as in your screenshot, by determining which (if any) of those newly derived values are in first or last rows for the group, they could maybe be reset to missing and acted upon separately, or in some other way filtered.

This might be quicker than iterating all rows right from the outset just to handle the few cases that may or may not exist.

It’s here, should you wish to take a look

Hello @Snowy,

how about simulation of parallel Group Loop to speed things up?

Note that now there is new forum category - Feedback & Ideas where feature requests can be posted.

Br,
Ivan

Thanks for the suggestion; I will keep the simulated parallel chunk in mind for future workflows.
Didn’t realize there was a new forum category!

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.