Any plans to speed up Column Expression node?

Hi! I measured the speed of different nodes and see that the wonderful “column expression” is 1 order of magnitude slower than other nodes. That is a pity as “column expression” is a very good alternative for medium experience users when doing thing a bit more complicated but not so complicated that needing Java expertise to solve them. Is the improvening of the speed of “column expression” in the developing plans? Or, is it is a heavy challenge to improve it?

2 Likes

These are my tests for the previous message using Knime 4.1.2 and applied to 30.000 row table:

Configuration for Tests

Test A) New constant column

  • Column Expression Formula: =1
  • Java Snippet formula: out_test = 1.0;
  • Java Snippet (Simple) formula: return 1.0;
  • Specific Knime Node used: Node "Constant Value Column"

Test B) New constant column

  • Column Expression Formula: =rand
  • Java Snippet formula: out_test = Math.random();
  • Java Snippet (Simple) formula: return Math.random();
  • Specific Knime Node used: Node "Random Number Assigner"

These are the results:

Test A

Test B

7 Likes

Wow, thank you very much for the detailed explanation. We’re actually at this very moment working on further improving performance. Any chance you can quickly upload those workflows on KNIME Hub? I’ll make sure we’ll use them in our benchmarks.

Regarding the ‘Column Expression’ node. We’ll take a look!

3 Likes

Interesting!
Actually I did program the Random Number Assigner, and it does not more than calling a rand and checking some borders…
Interesting that it would be slower than the others for Random.

Especially as the numbers are quite small, I would run it multiple iterations. @Vernalis has super cool nodes for this!

They can also measure how much memory you use and they even have a node for doing garbage collection.

We used them in our blog post: https://www.knime.com/blog/tuning-the-performance-and-scalability-of-knime-workflows

5 Likes

Okay, I needed to test this, made me too curious.

I had this looping for 100 iterations and increased to 100K Rows and it brought me the following:

that the Java nodes are slower than the Random nodes makes sense.

In case someone wants to try, I uploaded the workflow to my KNIME Hub.

3 Likes

Thank you @christian.dietz and @Iris, looking forward to see if “column expression” can be speed up for a next Knime version (crossing fingers!). Regarding @christian.dietz request to upload the knime test, I took the better design of Iris of the test B (random generator) and I made also the test A (constant generator). Here the links to the Knime Hub with it:


2 Likes

I will take a look on your blog https://www.knime.com/blog/tuning-the-performance-and-scalability-of-knime-workflows. Maybe will come with some questions there. I appreciate you have spent your time for teaching us how to improve our Knime models!

Hi @andres_sommerh

we just released KNIME 4.2.1 and this has a major speedup for the Column Expressions node implemented.

You get around a factor of 4 faster!

3 Likes