Feature request: integration of "Rule Engine" with "Math Formula" and "String Manipulation" nodes.

Dear all,

It would be nice if there was the possibility to have a single node that integrate Rule Engine with Math Formula and String Manipulation.

In this way it would be easier to create functions such as:

if(substr($B1$", 0, 2)="ab", sqrt($A1$), abs($A1$))

 

OR

 

if(AND($A1$>1, $A1$<8),"YES","NO" )

 

Thanks in advance.

Then you end up with the Java Snippet node...

I know, but It would be nice if there was the possibility, in the future, to do all data manipulation (Math, String and Rule) in a single node without the need to know Java.

Anyway, thanks!

Do you want a prolog node? ;-) (I was tempted to create one or provide an alternative syntax to rules similar to prolog.)

Hi aborg,

It would be enough to have a single node that integrates "Rule Engine", "Math Formula" and "String Manipulation" features, in which we can use the same language that is currently used in these nodes without the need to use java or a lot of nodes in order to write formula such as:

 

 

if(substr($B1$", 0, 2)="ab", sqrt($A1$), abs($A1$))

 

OR

 

if(AND($A1$>1, $A1$<8),"YES","NO" )

 

Thanks!

 

Well, I think the latter is already possible with the Rule Engine nodes. (And probably it would not be too hard to support a fixed set of additional operators, but one of the design goals for the Rule Engine nodes were to keep it simple and provide a nice user experience. It already allows the users to express more complex expressions that is possible with PMML RuleSets -except a few hard to express conditions with weights.)

If you create two additional columns with the abs and sqrt values, you can also do the first. (I know it is not efficient and a quite laborous workaround, but at least it is easier to follow what happened as you can inspect the intermediate steps.)

Aborg, first of all thanks for your answer! You’re right regarding the feasibility of the last formula: sorry I wanted to write:

if(AND(sqrt($A1$)>1, sqrt($A1$)<8),"YES","NO" )

I know that we can do it with various node/steps; but as you said, creating additional columns is not efficient/fast especially with complex formula, and usually other data analysis software allow the user to do all data manipulation tasks (on strings and numeric data) in the same environment.

 

Anyway, if one day these features will be implemented in one node, I will be grateful, otherwise never mind. :)

Thank you very much! 

So you want a kind of programming without Java. I can understand both parts. So, well... why not use the existing environment instead... One could let oneself be inspired by what the distance Nodes do, and add an extra port type that transports "rules" or "functions". Nodes using these could then be combined to create more complex computations. Most importantly, the rules would not be really evaluated until a final "application" Node, which could be something like a "Rule-Engine (Apply)", "Rule-based Row Splitter (Apply)", and so on, maybe even a "GroupBy (Apply)" or a "Column aggregator (Apply)".

Does that sound good or helpful in any way, to anyone?

Hi Marlin,

I'm not sure I have understood what you mean. Would you like a node that extract rules from distance Nodes?

What I would like is this: imagine you have a single node called, for example, “DataFunctions” that allow you to do all the “String Manipulation” Node functions (e.g. substr, lowerCase, capitalize, etc.) and all the “Math Formula” Node functions (e.g. sqrt, median, log, etc.) in the same node.

I think It would be interesting if there was this node (“DataFunctions”).

It would be even better if you could also add rules (if you prefer “IF”) in the same way it is done in the “Rule Engine” node.

Iiiaaa,

no, I was just inspired by them. And what I think about is rather the opposite of what you're asking for. I don't want to give all the power to one Node. Instead, I think about how one could turn all the formula-, rule-, and String-manipulation-subsystems into a Node-system. The intermediate formulas would be passed through ports, so instead of typing formulas, you would connect Nodes.

So there might be Nodes like

  • Columns to Function Variables (Table in, Function port out)
  • Unary Function (Function port in, Function port out)
  • Binary Function (two Function ports in, one out)
  • IF Switch (Function Variable) (one Function port in, two out)
  • ...

and the <x> (Apply) Nodes I mention in the other post would be aquivalent to their current relative, but with an extra Function port as input.  That would create bigger workflows than your DataFunctionBuffet, but would solve the same problems (having to iterate over tables several times, multiple Nodes that are almost the same) while not breaking the system of connecting Nodes. And if you use Metanodes, it even looks the same.

Was that any clearer?

 

Marlin, aren't these ideas like the PMML transformations? They look quite similar and with KNIME some of them seem to be already available. (Though things can improved certainly.)

Mh... interesting suggestion. I hadn't really locked into PMML yet.

I just glanced at it, but my first reaction is that PMML might be suitable, but a bit of overkill for the simple things we are talking about here, unless you hide a lot of the complexity. Also, the existing Nodes that work on PMML models are quite specialised. Of course you could always work on it as XML, but... yeah. Also, the basic idea was to try to use Nodes instead of a second language, so using PMML (or xml) alone defeats the purpose, unless, again, it is hidden.

So let's update my idea: Same principle, same basic Node set, but let's use PMML ports instead of what I called "Function ports". Now we are basically talking about Nodes for fine-grained manipulations of PMML models. That way beginners are not thrown at the whole language directly, but experts can still use all its power.

Basically, you just put my idea on steroids. ;)

Wohoo, the Knime team has done it again: there are new PMML Nodes in 2.11! That's great news for this idea! There's of course still something missing, but the goal is getting closer...