Hi,
It would be great if there was a “group by” feature within the Missing Value node. It appears if you are replacing missing values through a method such as moving average, the moving average looks up and down the column to create new new value… This is fine, if the missing values are in the middle of the group. However, if the missing values are at the beginning or end of the group’s data set, the moving average will create a value with the previous groups information.
Example:
As you can see, the missing value in ID A is created appropriately, taking 1 value back, and 1 value forward. However, ID B, since the first value is missing, takes 1 value back, which is actually a value for A. It is inappropriate to do this. Instead, I think it would be better if there was logic in place to start over and find the first non-missing value (which may actually be 2 forward values, but it would be better than combining two data sets that should not touch…). The same issue is present if the last value is missing… 1 backward value and 1 forward value means B & C’s data is being combined.
Of course, I can get around this by using a group loop with ID’s being the grouping principal… However, it just takes so long! Looping over a dataset of store sales with 1000’s of stores takes considerable amount of time.
I hope some additional logic can be placed into the Missing Value node to make it more robust and avoid the need for a loop.
Hi @snowy, for a totally different reason I recently wrote a component that marks via two new appended columns containing booleans the first and last row in a group (specified by a value on another single column). I wonder if having found the missing values in the separate column as in your screenshot, by determining which (if any) of those newly derived values are in first or last rows for the group, they could maybe be reset to missing and acted upon separately, or in some other way filtered.