I have to search 1.9 Mio. rows for the Top 200 values. Does anyone know how to do this efficiently?
Right now I am using the “sorter” (sorts 1.9 Mio. rows descending) node combined with the “sampling” node (takes top 200 rows).
I need to reduce the calculation time drastically.

Hi @FabianK

Maybe a solution is to split your dataset in a branch with the column(s) to sort on and a branch with the remaining column. Then use a Reference Row Filter node to filter (Row on ID) from the top branch.

Although “cheating”, I think this could best be done with some code to avoid sorting a huge table. Here’s an example using a Java Snippet node:

// system variables
static final int numValuesToKeep = 50;
List<Integer> values = new ArrayList<>();

// expression start

if (values.size() > numValuesToKeep) {
  values = values.subList(0, numValuesToKeep);

out_values = values.toArray(new Integer[Math.min(values.size(), numValuesToKeep)]);

It’ll iterate row-wise and keep a sorted collection with the top-n (numValuesToKeep) values, which will be appended as collection cell to each row. (that Java code is admittedly quite blunt and could itself be optimized further … great task for a programming interview :slight_smile: )

Find the sample workflow on my NodePit Space:


Here’s a bigger algorithms source.


There is a node for it. It is called Element selector

You can find it here with a lot of examples on the hub


The node was btw so popular that we put it into a base build. It got also finetuned. You can now find it here