deal with missing values with predictive model

Northern · June 3, 2018, 7:53pm

Hi there,

I want to use a predictive model for predicting the missing values and replace them for getting a complete datasets. Have any one idea about how can I with KNIME to deal with the following proceess:

divide the original data into two groups:

group 1 with all objects that are complete
group 2 with all objects that contain missing value in attributs

choose one attribut that contains missing values as target, use the data from group 1 for training with a predictive model
use data from group 2 for testing, predict and fill the missing values with the predicted value in the attribut
repeat 2-3 till there’s no attribut containing missing values
combine group 1 with group 2 in a final datasets with all missing values are repaced with predicted values

Thanks.

Best regards,
N.

mauuuuu5 · June 3, 2018, 11:25pm

I think I can understand what are you trying to achieve, but in step “3. use data from group 2 for testing, predict and fill the missing values with the predicted value in the attribut” . How are you going to test the model if you do not have the actual or observed labels as the group 2 dataset has the missing values.?

Am I right?

Cheers

Northern · June 4, 2018, 7:27am

Hi ,

Here I only want to use the predictive model for filling the missing values…
the final datasets will be then given to the further analysis processes and will be tested outside the missing value treatment process… is it resonalbe to do so?

Cheers.

Martin_K · June 4, 2018, 8:48am

Hi Northern,

IMHO, correct steps to do should be:

Divide group 1 into 2 groups:

1.1 training set
1.2 test set

Use 1.1 to build a predictive model and test it on 1.2. Tune predictive model or try multiple
predictive methods to get predicted results as close as possible to known outputs from test set.

Apply final predictive model on group 2 to replace missing values with predicted ones.

Martin K.

Northern · June 4, 2018, 10:25am

Hallo Martin,

Thanks for the reply. I think you’re right…
I’m wondering if I use a cross validation to evaluate the original datasets, all the steps above would be firstly happend as a black box in training set , and it should be then applied in the test set or validation set in the cross validation, is that right?
Do you know how to realise this process in KNIME?

Thanks.
N.

Martin_K · June 4, 2018, 12:42pm

Hi Northern,

I recommend you to see some examples on Knime server related with predictive modelling,
check 04_Classification_and_Predictive_Modelling branch within Knime.
You may also find more useful information on web - Chapter 6. Predictive Analytics.

Best regards !

Martin K.

beginner · June 5, 2018, 5:42am

Missing values usually are filled with mean or median prior to machine learning. Other options are using an algorithm that can deal with missing values or removing the affected columns (rarely an option).