Random data filter/generation based on count criteria

Hi Team,

How do i generate or filter random data based on the below criteria.

For example i have a data around 200 records/rows. if the records less than 100, i should consider all of them, else i should consider 5% of 200 or 100 records/rows which ever is higher.

I have used column expression to get the count value.

May i know what node should i use to get random data based on the count that i have from column expression.

Regards,
Pavan.

Hi Pavan,
You can use the Partitioning node to filter the data table such that it includes rows drawn randomly.
You can specify either the absolute number or the percentage of rows to keep.
-Don

1 Like

Hi there!

As @dnaki suggested you should use Partitioning node for random sampling which you will control with a flow variable which defines the number of rows you need to filter. You can get this flow variable on multiple ways. I used Extract Table Dimensions node followed by Table Column to Variable and Math Formula (Variable) nodes.

Check it out and if any questions feel free to ask. Workflow is attached.
2019_04_11_Partitioning_With_Criteria.knwf (17.3 KB)

Br,
Ivan

2 Likes

Hi Don and Ivan,

Thank you for your help, it worked. I have used column expression as there is no option to use multiple if else in math node. Below is the screen shot of the workflow just for reference.

Regards,
Pavan.

1 Like

Hi Pavan!

Glad you found a solution.

Math Formula node indeed does not have multiple if else but if you nest more if() function you actually get if else simulation :wink:

Also you need only one if() function cause only in case where 5% is bigger than 100 you need that 5% rows otherwise you take 100 rows each way :wink:

Br,
Ivan

Hi Ivan,

i am not aware that we can do nested if in Math Formula node, i will try.

Also there are 3 conditions here, 1st is if the number of records are less than 100, then it should take all the records, else, 2nd 5% of number of records or 3rd 100 which ever is greater. Based on your suggestion, i think we can do it with nested if. Thanks for that.

I would like to know how to you connect nodes using the curved arrows. i only have straight one’s.

Regards,
Pavan.

Hi Pavan!

To get curved arrows (and more grid options) see picture attached. It is also accessible from a File --> Preferences --> KNIME --> KNIME GUI --> KNIME Editor

Just an explanation on logic behind summing 3 conditions into 2 conditions. If 5% of number of records is less than 100 take 5% else take 100. This else also covers when there is less than 100 records because it can not take 100 if there is 58 records in data set :wink: But I agree having 3 conditions is clearer and I would write it with 3 conditions as well :smiley:

Br,
Ivan

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.