As a kind of follow-up to an earlier benchmarking post I made, I would like to draw attention to the relatively poor performance of the Math Formula (Multi Column) node on wide tables (KNIME 4.0). Have a look at the table below where I benchmarked the multiplication of every element of a table with Doubles with 2, using Python, R, and the Math Formula (Multi Column) node. The number of elements is kept constant, while the table geometry is varied:
Number of rows in input table
Number of columns in input table
Run time Python Script (1 -> 1) / seconds
Run time R Snippet / seconds
Run time Math Formula (Multi Column) / seconds
1048576
1
7.6
5.2
1.3
524288
2
5.9
3.8
0.8
262144
4
5.0
3.1
0.8
131072
8
4.8
2.5
0.6
65536
16
4.7
2.4
0.8
32768
32
4.7
2.4
1.1
16384
64
4.5
2.4
2.0
8192
128
4.8
2.4
3.8
4096
256
5.2
2.4
6.9
2048
512
5.9
2.2
13.2
1024
1024
7.3
2.6
25.6
512
2048
10.6
2.7
53.1
256
4096
20.4
3.9
110.1
My conclusion is, that for mathematical operations on a wide numerical table one is better off with an R snippet than with a Math Formula (Multi Column) node, although I could imagine that the situation becomes different when the Math Formula node is part of a series of streaming nodes.
Can you give some details on what you found? I have been experiencing really slow performance on KNIME 4.0.x when using wide table (200+ columns) and joins/ grouping; if there is a known problem and you have a fix coming then it would be nice to know. If the fix doesn’t appear relevant, then I might need to do further investigation.