consolidating tables by calculating the mean of similar columns of a large data set

Hi,

I have a wide CSV data set (1034 columns) with missing values. I want to combine multiple imputation methods and consolidate the resultant tables into one final table. I used the Missing Value node multiple times each using a different method and now want to join the three output tables into a single one by taking the mean value of the same cell across the three tables. Can anyone advise which node to use to achieve that sort of joining the three tables by taking the average? The process has to be automatic as any manual method will be laborious in such a large data set.

Hi Sammour,

You could use a Rule Engine node to generate new missing value columns for each of the three tables, then join the tables using Joiner node twice based on the unique row identifier to make a single table and then lastly use a Math Formula node to get the mean value from the three different missing value columns.
Screenshot 2022-03-28 at 17.28.07

Hope this helps, if this is not your use case then you can share more information perhaps a screenshot so i can understand better what you are trying to achieve.

Best Regards
Hannan

1 Like

Hi Hannan,

Thanks for the reply. We are talking about potentially hundreds of columns with missing values. I tried to select multiple columns inside the Rule Engine to apply the rule on multiple columns but that didn’t work. Moreover, I didn’t quite understand your suggestion to generate new missing value columns. Can you please elaborate?

Hi Sammour,

Thank you for your question , you could generate separate columns for different missing value methods by using the Rule Engine node(in my case i am taking the median value for first imputation on the column “Current” and then making a separate column for it)


) and then later use the Joiner node once or multiple times if you have more than two tables and then finally use the Math Formula node to take the mean of all the column values.

Best Regards
Hannan