Replace missing values based on another column

Is there a way to replace a missing value with an entry imputed by another column with the missing value node?

I have a data set where there are vendors, categories, and products. There are some categories where I’d like to replace a missing value with the most common (typically only) entry for that vendor. The only way I can think to do it is to group by the vendor along with the mode for category, then join that to my table, and then use a rule engine to replace missing values with my new imputed value.

That will work but it seems clunky and not super scalable. The data set has a lot more than the three columns of data and I might need to do this for other columns as well.

It seems like there should be a way to create a loop for this, but I do not know how in KNIME.

Thanks in advance,
Eric

Hi @ewhulbert,

Use a Group Loop Start node and the Vendor as the grouping column then handle the missing values inside the categories column by replacing them with the most frequent value.

:blush:

1 Like

Woohoo, thanks Armin! I think this is the second or third time you’ve helped me, much appreciated.

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.