I have housing data that includes neighborhood and lot area frontage. I want to replace missing values of lot area frontage with the median value for that neighborhood. I can use a GroupBy node to group by neighborhood and aggregate by median of lot area frontage. That produces a table of neighborhoods and median lot area frontage for each neighborhood.
I could do what I want to do with an R script (but, then, what’s the point of messing around with Knime to begin with?); I could also use a Rule Engine node but I would have to manually code the median value for each neighborhood (rather than just “feed in” my data table) - and then it would no longer work if I got data that contained information on a different neighborhood.
I want to “feed” the GroupBy data table to one or more Knime nodes to check if the particular row has a value of neighborhood that is also in the GroupBy data table. If it finds that neighborhood then it replaces the missing lot frontage area with the median value for that neighborhood.
I can come close to what I want using the Cell Replacer node. The problem there is that it assumes you want to replace the “looked up value” (in this case, the neighborhood). I don’t want to replace that value. I want to find that value, and then replace a different column value (the missing lot frontage area). How can I do that using one or more Knime nodes designed for the purpose?
Any guidance is appreciated.