Table random manipulations

Hello to all,
I am new to this community and to Knime, thank you in advance for your help and patience.
I want to randomly delete cells from a table across columns and rows.
This could be rephrased as I want to copy random cells from a table into a new table keeping the same table structure.

Anyone has an idea on how to to that simply in a resource efficient way?
My table is over 30 million lines and 10 columns.
Thanks!

Hi @smontigaud and welcome to the forum!

You could try the Disturber node for this purpose.

4 Likes

That is a one node solution to my problem. Very impressive!
Thanks @ScottF

I can get a table with 10% missing value, but it seems repeating the Disturber in the workflow on the 3rd node output port has no effect.
Do you know how I could obtain 30% missing values for example?

Hi @ScottF

As an answer to the question of @smontigaud I tried to connect multiple Disturber nodes after each other to create more missings.
Knipsel
But it looks like the missing are always assigned to the same cell. So the Disturber node does not work completely random ?! .

gr. Hans

1 Like

Hi @smontigaud,

How about using a Partitioning node to remove 30% of data randomly and joining this output to the main table? Joining will take a while in your case but this way you will get exactly what you want.

Here is an example:
create_missing.knwf (147.8 KB)

:blush:

3 Likes

Hi there,

You are right @HansS. Disturber node always assigns missing value to the same cells if table is the same. As a workaround one can use Shuffle node in between. Or @armingrudd solution :slight_smile:

Br,
Ivan

3 Likes

@armingrudd @ipazin thank you both, these additional solutions work fine for my purpose.

3 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.