generate random missing values?

Northern · June 4, 2018, 3:43pm

Hi ,

have anyone idea about how to generate missing values that randomly distributed in a datasets?

thanks.

N.

stelfrich · June 5, 2018, 11:25am

Could you please define the process of dropping values in greater detail? Do you want randomly drop values from one column or multiple ones? You could for example use the Java Snippet node with something like the following:

// system imports
import java.util.Random;

// expression start
if (new Random().nextBoolean()) {
	out_stringColumn = null; // assign null for missing value
} else {
	out_stringColumn = c_stringColumn; // keep the input value
}

If go a bit more into detail why you are trying to do that, we might be able to come up with a better suited solution.

Best,
Stefan

Iris · June 5, 2018, 3:49pm

Hi @Northern

in my paper, the New Iris Data I also described an approach for randomly inserting missing values

Cheers, Iris

Iris · June 5, 2018, 3:53pm

Ah the example workflow is here:

Northern · June 5, 2018, 7:26pm

Hi Stefan,

Thanks for the reply…
I’m trying to generate missing values for multiple columns in a dataset with some categorical and numerical attributes…The dataset should be used for test if the model works well for dealing with some missing values. Have you any idea about that??

Best Regards,
N.

Northern · June 5, 2018, 7:30pm

Hi iris,

But in this example I can only generate the missing values on predefined attributes, is that right? How can I do it for a large dataset , to randomly choose some attributs and drop some values out of them?

best,
N.

Iris · June 6, 2018, 7:02am

Hi Northern,
yes this is correct. You would need a loop to do this for all columns.

Or we have a node for testing in our testing extension. It will output you a table where all columns have additional missing values. Here it is:

https://hub.knime.com/knime/nodes/Disturber_Node*l-_jLanQCcCnK7nH

Cheers, Iris

Northern · June 10, 2018, 8:49pm

thanks Iris, it helps me a lot:)
I think that would be better if it also allow the users to define the percentage of missing values themselves.

best regards,
N.