Just learning KNIME - really like it. Making good progress but have a question.
I would like to create new variables based on the rules that define the end nodes created by a decision tree. I used the sample workflow "RuleSetFromPMMLv3 to identify the end notes, wrote the model out with the PMML WRITER and can read it in with the PMML READER, but don't know how to execute the PMML so it creates the new variables automatically. Is that something that is possible?
Any suggestions greatly apprediated
This wasn't very clear - . Basically, I want to assign to each record in the underlying table, in a separate column, the appropriate end node assignment.
Found the node "Decision Tree to Ruleset" that creates the decision tree ruleset for the end nodes - that was easy (see attachment). So the question now has become - how can I apply those decision rules to actually score a file. Do I need to write a bunch of case statement that apply those rules, or can one somehow magically apply the underlying PMML?
I would just use the Decision Tree Predictor node on the original PMML (though I might misunderstand your original problem). In case you have the rules in a table (like in the png you attached from the output of the Decision Tree to RuleSet node's table output), you can use the RuleSet Predictor using its PMML output, or the Rule Engine (Dictionary) in case you prefer the table input (because you probably want to adjust the rules).
Hi gabor - thank you so much for responding. I am getting closer - maybe solved the issue, but below is what I want to accomplish - maybe there is an easier way:
I want to create new variables based on the decision tree rules that I get from the "Decision Tree Ruleset" node. That allows me to capture underlying interactions in the data. And, of course, want to do this as efficiently as possible.
So I copied the rules from the "Rules Table" node into a spreadsheet - developed a penetration index based on the target categories and then assigned a number based on that ranking. So "1" describes the records in the node with the highest penetration, 2 the next highest, etc - one ends up with an ordinal ranking of records based on the decision tree rules.
Then I copied that syntax into the "Ruleset Editor" -> executed it and now I have a column that represents underlying interaction in an ordinal fashion. I might still create binary variable from this column.
So this works pretty good - but any ideas about doing this better are greatly appreciated.