Minimum number of records in a decision tree not working

Dear All,

I have tried setting up minimum number of records (in setting of decision tree) in each node while building a decision tree, but unfortunately it’s not working. Can anyone please help me on this?

Hi @ChetanP,
the issue here is that a tree node is not split again once it reaches the set number of records in a leaf. However, a split is still performed if the resulting leaf contains less than the set number of records. It just stops growing the tree there after the split.
Kind regards
Alexander

1 Like

Hello @AlexanderFillbrunn,

Agree. But i have more than 11K total records and tree is giving only two nodes, when i set the minimum number of records=50. When i don’t set any constrain on (minimum number of records) it gives more than 300 nodes, clearly it’s over-fitting hence i wanted to set the minimum number of records.

Hi,
I understand your problem. I think there are also some extra rules in the algorithm, e.g. that a certain number of newly split nodes needs to be above the threshold concerning the number of records. Unfortunately the behavior cannot be tuned so finely, so you might have to try out different values between 1 and 50 to achieve your goal.
Kind regards
Alexander

1 Like

You might try to optimise the parameters with a loop or grid search.

Obviously you could also use other algorithms but if you specifically want a Decison tree (to derive readable rules e.g.) parameter optimization might be a way.

3 Likes

Thanks @mlauber71.

I am not getting Parameter Optimization Loop in my knime. Not sure why. Can you please help me on this?

@mlauber71

What i have is “Model Loop”, but i don’t see the “Parameter Loop Optimization” in Knime

Hi,
you find the nodes in the Parameter Optimization extension. You can also just drag&drop the node from the KNIME Hub and KNIME will install the extension for you!
Kind regards
Alexander

Hi there @ChetanP,

every node in KNIME is part of extension and when you install KNIME not all KNIME extensions are installed immediately and thus some nodes are not available in Node Repository. To add capabilities to KNIME, you can install a variety of extensions. After installing certain extension nodes from that extension will appear in Node Repository. To search for nodes and extensions (and install them!) you can use above mentioned KNIME Hub and Drag&Drop possibility. Also you can install extensions (and check what you have installed already) from with KNIME itself. See here: https://www.knime.com/downloads/update

Hope this clarifies missing nodes issue :slight_smile:

Br,
Ivan

1 Like

Hello @ipazin & @AlexanderFillbrunn,

Thank you very much for your help. It’s working.

I have run the decision tree,how can I test the accuracy of training data set? When I use Scorer
I get only the testing data set accuracy not for the training data set.

Would you please help me here?

Thanks once again.

Hi,
you just have to use the Decision Tree Predictor on the training dataset and append a Scorer to that as well.
Kind regards
Alexander

@AlexanderFillbrunn,

I have use Decision Tree predictor and it’s only giving testing data set accuracy, but not sure what you mean by append a Scorer to that as well?

hi,
you have a predictor that gets the test data. Just add another predictor, connect the PMML port to the Decision Tree Learner and the data port to the top part of the Partitioning node. Then add a Scorer behind the new predictor.
Kind regards
Alexander

@AlexanderFillbrunn,

I am trying hard to understand, but i could not.

Please help me with the proper flow, screenshots will help me here. please.

Regards,
Chetan Patil

Hi,
this is what I mean.
Kind regards
Alexander

3 Likes

@AlexanderFillbrunn,

Thanks a lot. I am new to Knime, but it’s because of you it seems very easy.

3 Likes

It is :slight_smile: If you want to see waht esle you could do with Decision Trees you could have a look at this example:

1 Like

2 posts were split to a new topic: Flow variable connections for nodes without explicit flow variable ports

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.