Create new variables based on the end nodes of a Decision Tree Learner: How get Node Id # into the data set

shimshock · September 2, 2017, 1:26am

I am fairly new to knime and I am trying to do something that a colleague has done in SPSS Modeler. I have a decision tree learner and I can explore the tree just fine. One thing that modeler does is provide a node number for each tree (branch/leaf 0 not sure of the terminology). When I look at the Decision Tree Model (under the PMML tab) I can see that each node of the decision tree has an ID and the "rule" for that specific node. Is there an easy way I can create a column that would provide the end node ID for each row in the data set? For example is 25 rows ended in Node ID 60 that 60 would appear in the "End Node" column for those 25 rows.

Thanks, any help is appreciated.

Stephen

RolandBurger · September 5, 2017, 11:25am

Hi Stephen,

There is no straightforward way to do this in KNIME due to how the information is stored in the PMML model. The PMML does not contain any actual data, just the rules to build the model. Therefore, it is not possible to infer from the PMML alone which concrete data points belong to which node.

One thing you could do is to use the Decision Tree to Ruleset node and then use the results to assign your data points to one of the rules.

Cheers,

Roland

shimshock · September 5, 2017, 6:45pm

Roland, this is helpful and that is the direction I was going. However, I can't quite seem to do the last thing you mentioned. In the Decision Tree to Ruleset node I end up table that shows the 20 rules that were created and the record count belonging to each rule - along with probability scores. This is exactly the data I need. But, I need at the individual row level not that the aggregated rule level. For example, let say I had 100 out of 500 records that fell into Rule 1. I now need to create a column titled "Rule" and for those 100 records 1 (or something like Rule 1) would appear in that column for those records.

Any thoughts on how to do that? Thank you so much.

shimshock · September 5, 2017, 10:57pm

Hi Roland, thanks again for your reply. It did stir my thinking and I do have a workable (though not ideal) solution. I did what you said and used the Decision Tree to Ruleset node then I took each rule and put them into the Ruleset Editor node. I say it's not ideal because I had to copy and paste my rules (Condition) column from the ruleset into an excel spread in order to paste them into the ruleset editor - so it's not completely automated (though I am sure with some digging I can find a way to do it). Thanks again for your reply - it pointed me in the right direction.

Stephen

RolandBurger · September 6, 2017, 11:13am

Hi Stephen,

No need to do this manually, you can use the results of the Decision Tree to Ruleset node as dictionary in the Rule Engine (Dictionary) node. For that to work, you should select "Split rules to condition and outcome columns" in the Ruleset node.

Cheers,

Roland

shimshock · September 6, 2017, 5:24pm

That works! Thanks so much for taking the time to help. I appreciate it.

Stephen