Missing value handling based on trainingset

Hi all,

I’m trying to perform different datamining techniques on my dataset. However, there are missing values in my training, validation and testset. I was wondering if it was possible in KNIME to impute the mean/modus of the training set in the training, validation and test set to replace the missing values? (in order to avoid overfitting)

Kind regards,

Hi @jeandony -

Yes, you can impute missing values using the aptly-named Missing Value node. There are a number of strategies for imputation available, including both mean and mode. You can impute for individual columns, or across types of columns if necessary.

There is also the Missing Value (Apply) node, which ensures that you can apply consistent imputation strategies across test/training/validation datasets. A short example workflow featuring both is here:

Give them a try!

3 Likes

Thanks @ScottF, that really helped me!

1 Like

Hi Scott, could you kindly share a link to the Missing Value Handling knfw where I could download it?
Dragging and dropping it generates an error in my Knime client (4.0), can’t frop it there.
Thank you, appreciate your help: W.

Hi @whatsoever333 and welcome to the forum.

If you click on the cloud icon on the upper right, you should see an option to open the workflow in KNIME, or separately download it:

2020-08-17 14_06_38-Missing Value Handling – KNIME Hub - Brave

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.