Possibility to combine multiple Rule engine nodes (like Math multi-column function)

Sivasanmugam · May 29, 2020, 2:04pm

Hi All,

I have to join two datasets on few parameters (almost like a cross join) and compare a few more additional parameters after the join between the two datasets and allocate a “score” in the Rule engine node based on the comparison. Then I add the individual scores to arrive at my final score. Then I group by them based on the max. score per “key” and join again to keep only the best possible match per key.

The problem for me is that with the join I am creating over 10 million rows and the workflow becomes much slower when running through all the individual Rule engine nodes. So is there a better and faster way of doing it? Thanks.

Best regards,
Siva

armingrudd · May 29, 2020, 2:17pm

Hi @Sivasanmugam and welcome to the KNIME forum,

May I ask what rules you are applying and what column types you are using please?
Are you aware of the if() function in the math formula node? This function can help you to define rules based on numeric columns and calculate the final score in the same single node.

Sivasanmugam · May 29, 2020, 3:05pm

Thanks for your quick response @armingrudd . The column types I use for comparison are mostly string types. Rules are mostly “Equal or not” comparison (with a score of ‘x’ or 0), but I also have some cases where the columns compared can have different values and still get a score which is not 0.

PS: I was not aware of the if() function within Math formula. Is there a syntax available for the same? Thanks.

armingrudd · May 29, 2020, 3:23pm

So maybe you want to give the Column Expressions node a try to do all the scoring steps in a single node. You can use if-else statements and define temporary variables. Read more about this node here.

The Math Formula node works only with numeric columns.

Sivasanmugam · May 29, 2020, 4:18pm

Hi @armingrudd ,

Thanks for your suggestion on Column Expressions. Attaching an example workflow of the problem I am trying to solve. Is it something I can solve by using Column Expressions? Thanks.

Multiple rule engine.knwf (86.7 KB)

armingrudd · May 29, 2020, 4:38pm

Here you are:

Multiple rule engine V2.knwf (92.7 KB)

Sivasanmugam · May 29, 2020, 5:11pm

Thanks a lot, @armingrudd.

This looks much more elegant than my multiple rule engines. However, I still have an issue with workflow performance.

With my multiple rule engine and math function, the total time needed for executing those nodes was 46ms. But the Column Expression node takes 109 ms to execute. Is there a way to improve the execution speed of the Column Expression node?

armingrudd · May 29, 2020, 5:29pm

Yes, the Column Expressions node is slow.
I always try pure KNIMEing but in your case, I think the best option is using the Java Snippet (Simple) node which is fast and replaces all those rule engine and the math formula nodes:

Multiple rule engine V3.knwf (74.9 KB)

Sivasanmugam · May 29, 2020, 6:54pm

Great, Thanks a lot @armingrudd, for your incredible help and for introducing me to Column Expression and Java snippet nodes.

system · June 5, 2020, 6:54pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.