Hi ,
have anyone idea about how to generate missing values that randomly distributed in a datasets?
thanks.
N.
Hi ,
have anyone idea about how to generate missing values that randomly distributed in a datasets?
thanks.
N.
Hi @Northern,
Could you please define the process of dropping values in greater detail? Do you want randomly drop values from one column or multiple ones? You could for example use the Java Snippet node with something like the following:
// system imports
import java.util.Random;
// expression start
if (new Random().nextBoolean()) {
out_stringColumn = null; // assign null for missing value
} else {
out_stringColumn = c_stringColumn; // keep the input value
}
If go a bit more into detail why you are trying to do that, we might be able to come up with a better suited solution.
Best,
Stefan
Hi @Northern
in my paper, the New Iris Data I also described an approach for randomly inserting missing values
Cheers, Iris
Ah the example workflow is here:
Hi Stefan,
Thanks for the reply…
I’m trying to generate missing values for multiple columns in a dataset with some categorical and numerical attributes…The dataset should be used for test if the model works well for dealing with some missing values. Have you any idea about that??
Best Regards,
N.
Hi iris,
But in this example I can only generate the missing values on predefined attributes, is that right? How can I do it for a large dataset , to randomly choose some attributs and drop some values out of them?
best,
N.
Hi Northern,
yes this is correct. You would need a loop to do this for all columns.
Or we have a node for testing in our testing extension. It will output you a table where all columns have additional missing values. Here it is:
https://hub.knime.com/knime/nodes/Disturber_Node*l-_jLanQCcCnK7nH
Cheers, Iris
thanks Iris, it helps me a lot:)
I think that would be better if it also allow the users to define the percentage of missing values themselves.
best regards,
N.