I have a sorted table with 1M rows. In the next step I want to separate out the top-1000 rows with the Partitioner node. This takes a long time, and it seems the node goes through all the rows. Why would this be necessary if you choose an absolute number of rows to be taken from the top? The row Splitter node seems to behave in a similar fashion, it goes through all rows as well.
both Partitioning and Row Splitter node also create second data set (around 999.000 rows in this case) while Row Sampling or Row Filter don’t and would say that is the reason behind execution difference.
check out Top k Selector node. Node description states that the implementation of this node is more efficient than “Sorter + Row Filter” combination .