Hello to all,
I am new to this community and to Knime, thank you in advance for your help and patience.
I want to randomly delete cells from a table across columns and rows.
This could be rephrased as I want to copy random cells from a table into a new table keeping the same table structure.
Anyone has an idea on how to to that simply in a resource efficient way?
My table is over 30 million lines and 10 columns.
Thanks!
That is a one node solution to my problem. Very impressive!
Thanks @ScottF
I can get a table with 10% missing value, but it seems repeating the Disturber in the workflow on the 3rd node output port has no effect.
Do you know how I could obtain 30% missing values for example?
As an answer to the question of @smontigaud I tried to connect multiple Disturber nodes after each other to create more missings.
But it looks like the missing are always assigned to the same cell. So the Disturber node does not work completely random ?! .
How about using a Partitioning node to remove 30% of data randomly and joining this output to the main table? Joining will take a while in your case but this way you will get exactly what you want.
You are right @HansS. Disturber node always assigns missing value to the same cells if table is the same. As a workaround one can use Shuffle node in between. Or @armingrudd solution