I would like to design (and work with) my own decision tree model (not computed by any algorithms).
I assume that the Ruleset Editor Node is what I'm needing here.
(Just to perform a little check I used the Decision Tree to Ruleset Node on a Decision Tree Learner, copied the rules table into the Ruleset Editor, connected it to a Ruleset Predictor Node and got the exact same results from the Scorer Node as I would when using the Decision Tree Predictor together with the Decision Tree Learner.)
So let's say I make up my own rules and write them into the Ruleset Editor Node. Is the Ruleset Editor the right way to go or am I missing something? Is it possible to somehow translate those rules into a decision tree image? Or is there another way to directly change the outcome of the Decision Tree Learner model?
Help is much appreciated :-)
in theory you are right. The outcome of a decision tree is like a ruleset. But: If you look at the PMML of a decision tree and a ruleset, there is much more information in the PMML of the decision tree. The "decisions" of a PMML of a ruleset look like this:
<RuleSetModel modelName="RuleSet" functionName="classification">
<MiningField name="sex" invalidValueTreatment="asIs"/>
<MiningField name="prediction" invalidValueTreatment="asIs" usageType="target"/>
<SimpleRule score="true" weight="1.0">
<SimplePredicate field="sex" operator="isMissing"/>
<SimpleRule score="false" weight="1.0">
In contrast the "decision" of a decision tree PMML looks like this:
<Node id="0" score="<=50K" recordCount="32561.0">
<ScoreDistribution value="<=50K" recordCount="24720.0"/>
<ScoreDistribution value=">50K" recordCount="7841.0"/>
<Node id="1" score="<=50K" recordCount="8305.0">
<SimplePredicate field="relationship" operator="equal" value="Not-in-family"/>
<ScoreDistribution value="<=50K" recordCount="7449.0"/>
<ScoreDistribution value=">50K" recordCount="856.0"/>
<Node id="15" score="<=50K" recordCount="8078.0">
<SimplePredicate field="capital-gain" operator="lessOrEqual" value="8296.0"/>
<ScoreDistribution value="<=50K" recordCount="7448.0"/>
<ScoreDistribution value=">50K" recordCount="630.0"/>
You can see that the proportions of the learning data for every node is given in the decision tree PMML and it is used in the plot as well. In addition the model-meta-information are different (either stating a ruleset-model or a decision tree model). So if you want to use the decision tree visualization for a ruleset, it should work by manipulating the PMML (change the model-meta-information, translate your rules into nodes and enrich them with the proportions). AFAIK there is no standard node doing the job for you.
Hope that helps.
this was really helpful. I think I know how to work around this now - thanks a lot!