Performance of Math Formula (Multi Column) node is poor for wide tables

Dear Knimers,

As a kind of follow-up to an earlier benchmarking post I made, I would like to draw attention to the relatively poor performance of the Math Formula (Multi Column) node on wide tables (KNIME 4.0). Have a look at the table below where I benchmarked the multiplication of every element of a table with Doubles with 2, using Python, R, and the Math Formula (Multi Column) node. The number of elements is kept constant, while the table geometry is varied:

Number of rows in input table Number of columns in input table Run time Python Script (1 -> 1) / seconds Run time R Snippet / seconds Run time Math Formula (Multi Column) / seconds
1048576 1 7.6 5.2 1.3
524288 2 5.9 3.8 0.8
262144 4 5.0 3.1 0.8
131072 8 4.8 2.5 0.6
65536 16 4.7 2.4 0.8
32768 32 4.7 2.4 1.1
16384 64 4.5 2.4 2.0
8192 128 4.8 2.4 3.8
4096 256 5.2 2.4 6.9
2048 512 5.9 2.2 13.2
1024 1024 7.3 2.6 25.6
512 2048 10.6 2.7 53.1
256 4096 20.4 3.9 110.1

My conclusion is, that for mathematical operations on a wide numerical table one is better off with an R snippet than with a Math Formula (Multi Column) node, although I could imagine that the situation becomes different when the Math Formula node is part of a series of streaming nodes.

Best,
Aswin

6 Likes

Hi @Aswin -

Thanks very much for the detailed benchmarking and feedback. Our devs are looking into the cause.

1 Like

@Aswin,

awesome post :+1: - thank you very much!!!

We already looked into the code, found the reason why it’s being so slow, and are currently fixing it.

If you have discovered other weird performance issues it’d be awesome if you could share them with us!

Best
Mark

3 Likes

Hi @Mark_Ortmann,

that’s awesome, thank you! :smiley:

If you have discovered other weird performance issues it’d be awesome if you could share them with us!

Well there’s still this one… :sweat_smile:

https://forum.knime.com/t/script-chaining-slow/13262

Best,
Aswin

1 Like

@Mark_Ortmann

Can you give some details on what you found? I have been experiencing really slow performance on KNIME 4.0.x when using wide table (200+ columns) and joins/ grouping; if there is a known problem and you have a fix coming then it would be nice to know. If the fix doesn’t appear relevant, then I might need to do further investigation.

@DiaAzul,

the fix is solely related to the Math Formula (Multi Column) node.

Regarding your problem. Did joins/grouping use to be faster with 3.7.x?

If so does adding
-Dknime.async.io.cachesize=10 could also be set to 25
to your knime.ini increase your performance?

@Mark_Ortmann, thank you for the response. I’ll do some testing and post back results in a different thread.

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.