and I’m a bit confused about the solution you have implemented in last KNIME version (4.4.1) to fix this error. The problem encountered by @Hsbcoeeer was due to a wrong upstream threshold setting to generate the class label which led to a “only-one-class” labeling of his training set. I found at that time normal that the Decision Tree Ruleset failed, even meaningful, because it doesn’t make sense for me to generate a Decision Tree from only-one-class-labeled training set.
Is the -Decision Tree Rule- node now at least showing a warning in this case so that people may be aware that there may be a class threshold problem upstream ?
I understand your concern about training decision tree with only-one-class-labeled training set. But the Decision Tree Learner Node works for any number of target classes [1, n]. You only get a warning when the number of target classes are too many. The Decision Tree Learner node will generate a single node (trivial) tree when the training data contains only a single class. That has been the case always.
The fix @nemad introduced is about converting an existing tree to Ruleset using Decision Tree to Ruleset node. As he explained, the node was failing when the tree was a single node tree. Now it works by converting it into a simple rule (TRUE=>SingleClass).
I thought about your comment a bit and want to make sure I understand you correctly:
Your concern is that an error in the workflow that was previously detected because of the bug would now go unnoticed, right?
That’s a very valid point but I’d argue the node that should show a warning is the Decision Tree Learner, not the Decision Tree to Ruleset node because the problem is really located in the former (and the latter is far less prevalent).
What do you think of this solution?
Indeed, I definitely agree that the Decision Tree Learner (DTL) should at least show a warning stating that the Training set has only one defined class. I would even say that it should definitely fail and return an error because a DT such as “TRUE=>SingleClass” is not really a DT and hence may not be informative at all.
My favorite solution would be that the DTL node fails and the DTTR at least warns of getting a weird DT.
My concern is that people may not know or just not notice, as it was the case in the initial thread (Execute failed: ("NullPointerException"): null Decision Tree Ruleset), that something was wrong upstream (here incorrectly setting a threshold to define classes) if the DTL does not warn or fails in this particular case.
I, personally, am against making the decision tree learner node fail. A warning is enough in my opinion. Think of automation, loops, creating an ensemble of trees … And as for Decision Tree to Ruleset node, I don’t see a problem of it working with a single node tree. There are legitimate single node trees out there.
Although I can understand your point of view, personnally I would prefer the Tree Learner node to fail, specially in the case of automation: it is showing that something was done wrong and not fixed upstream. But I can understand that this is a programming choice
Indeed, a DT with a single node when appropriate is the equivalent of a linear separator in the space of descriptors (or a convex hull) so absolutely perfect to me. There are DT with just a node but they contain two Decision Tree rules for two different classes.
What is the meaning of a single node DT with only the decision rule “TRUE=>SingleClass”, apart from showing that there is a class threshold error upstream ?