generate random missing values?

Hi ,

have anyone idea about how to generate missing values that randomly distributed in a datasets?

thanks.

N.

Hi @Northern,

Could you please define the process of dropping values in greater detail? Do you want randomly drop values from one column or multiple ones? You could for example use the Java Snippet node with something like the following:

// system imports
import java.util.Random;

// expression start
if (new Random().nextBoolean()) {
	out_stringColumn = null; // assign null for missing value
} else {
	out_stringColumn = c_stringColumn; // keep the input value
}

If go a bit more into detail why you are trying to do that, we might be able to come up with a better suited solution.

Best,
Stefan

Hi @Northern

in my paper, the New Iris Data I also described an approach for randomly inserting missing values :slight_smile:

Cheers, Iris

3 Likes

Ah the example workflow is here:

2 Likes

Hi Stefan,

Thanks for the reply…
I’m trying to generate missing values for multiple columns in a dataset with some categorical and numerical attributes…The dataset should be used for test if the model works well for dealing with some missing values. Have you any idea about that??

Best Regards,
N.

Hi iris,

But in this example I can only generate the missing values on predefined attributes, is that right? How can I do it for a large dataset , to randomly choose some attributs and drop some values out of them?

best,
N.

Hi Northern,
yes this is correct. You would need a loop to do this for all columns.

Or we have a node for testing in our testing extension. It will output you a table where all columns have additional missing values. Here it is:

https://hub.knime.com/knime/nodes/Disturber_Node*l-_jLanQCcCnK7nH

Cheers, Iris

1 Like

thanks Iris, it helps me a lot:)
I think that would be better if it also allow the users to define the percentage of missing values themselves.

best regards,
N.