Finding the Top 100 (n) values

Dear all,

I have to search 1.9 Mio. rows for the Top 200 values. Does anyone know how to do this efficiently?
Right now I am using the “sorter” (sorts 1.9 Mio. rows descending) node combined with the “sampling” node (takes top 200 rows).
I need to reduce the calculation time drastically.

Best regards,
Fabian

1 Like

Hi @FabianK

Maybe a solution is to split your dataset in a branch with the column(s) to sort on and a branch with the remaining column. Then use a Reference Row Filter node to filter (Row on ID) from the top branch.

1 Like

Although “cheating”, I think this could best be done with some code to avoid sorting a huge table. Here’s an example using a Java Snippet node:

// system variables
static final int numValuesToKeep = 50;
List<Integer> values = new ArrayList<>();

// expression start
values.add(c_value);
Collections.sort(values);

if (values.size() > numValuesToKeep) {
  values = values.subList(0, numValuesToKeep);
}

out_values = values.toArray(new Integer[Math.min(values.size(), numValuesToKeep)]);

It’ll iterate row-wise and keep a sorted collection with the top-n (numValuesToKeep) values, which will be appended as collection cell to each row. (that Java code is admittedly quite blunt and could itself be optimized further … great task for a programming interview :slight_smile: )

Find the sample workflow on my NodePit Space:

7 Likes

Here’s a bigger algorithms source.

2 Likes

There is a node for it. It is called Element selector

You can find it here with a lot of examples on the hub

https://kni.me/n/PjtMg0VEXswXR5Uk

12 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.

The node was btw so popular that we put it into a base build. It got also finetuned. You can now find it here

7 Likes