Selecting the top 50 rows of a table by a control column

I’m sure there is a simple way in KNIME to do this but I can’t seem to find it. I have a large table tens of millions of rows and want to compress the table by selecting only the first 50 rows for every value in a control column. I have a workaround by calling a Python script but this is inefficient when I run the overall process in a loop multiple times…
Many thanks

Hi @sys4381 and welcome to KNIME Community,

You can use the Row Sampling node to pick equal number of rows from a class (Randomly). Just select the “Stratified sampling” option the configuration window of the node and select the class column. You can choose how many rows you want in the output (absolute or relative).

But if you insist to keep the first 50 rows in each class then I think you have to use the Group Loop Start node (include the class column) and the Row Filter node (Include row by number) to achieve that.



Many thanks for your rapid response. Yes, I do need the first 50 rows (the table is already sorted) so your Group loop start suggestion worked a treat and super fast.
Very impressed with KNIME - the visual metaphor is amazing and matches the way I work really well.

Thanks again


This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.