Partitioning absolute number of rows from top slow

evert.homan_scilifelab.se · April 12, 2021, 4:54pm

Hi,

I have a sorted table with 1M rows. In the next step I want to separate out the top-1000 rows with the Partitioner node. This takes a long time, and it seems the node goes through all the rows. Why would this be necessary if you choose an absolute number of rows to be taken from the top? The row Splitter node seems to behave in a similar fashion, it goes through all rows as well.

Just curious/Evert

izaychik63 · April 12, 2021, 5:25pm

Look if

works better.

evert.homan_scilifelab.se · April 12, 2021, 5:47pm

Brilliant…takes the blink of an eye.

Thanks!

Best/Evert

ipazin · April 12, 2021, 9:31pm

Hello @evert.homan_scilifelab.se,

both Partitioning and Row Splitter node also create second data set (around 999.000 rows in this case) while Row Sampling or Row Filter don’t and would say that is the reason behind execution difference.

Br,
Ivan

evert.homan_scilifelab.se · April 13, 2021, 7:21am

The Sorter node is also quite slow, presumably because it needs to compare all rows. iI there an alternative node that you know of?

Thank you for you explainations/Evert

ipazin · April 13, 2021, 11:55am

Hello @evert.homan_scilifelab.se,

check out Top k Selector node. Node description states that the implementation of this node is more efficient than “Sorter + Row Filter” combination .

Br,
Ivan

evert.homan_scilifelab.se · April 13, 2021, 12:08pm

Thanks, will try!

Cheers/Evert

system · April 20, 2021, 12:08pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.