Decision Tree

Hi All,
@AlexanderFillbrunn

I have run the Decision tree. I have exported the rules as well while looking at the rules I got to know that number of total records count in that particular node in decimal places. How is this possible? Also how can I get rid off such this, it just making Decision Tree non-usable in production.

Hi,
Can you share the relevant part of your PMML file? I am not sure I understand your problem. Is it that the record count is a double number? It could be, but it should always end in “.0”.
Kind regards
Alexander

Hi @AlexanderFillbrunn,

My problem is at the terminal node number of records are in fractions. That is where is problem.

To give you an example I have total 220 observation at the start. Latter it got split into Single, Married & Unknown. But the number of that goes into each of these are in decimal places. I want to get rid-off this.

Here is the screenshot of the same.Capture1

Hi,
That looks strange. I cannot reproduce this here, so if you could send me the workflow that would be helpful.
Kind regards
Alexander

@AlexanderFillbrunn Sure.

I have done some analysis on that, seems for variable Marital Status there are some missing values (blank) and hence those blanks are (not sure how) got merged with Single, Married and Unknown.

Hi @ChetanP,
that makes sense. If you have missing values and the missing value strategy set under PMML settings in the Decision Tree Learner is “lastPrediction”, a record with a missing value in the tested column is predicted using the last valid inner node in the decision tree. In terms of number of records in the leaves, a fraction of that record is counted towards each child of that inner node, that’s why you get fractions for record counts.
Kind regards
Alexander

1 Like

@AlexanderFillbrunn,

Thanks, understood now. Is there any way to get rid-of this? Also, if want to create a separate category let’s say missing for all categorical variables out there, how can I do it?

Hi,
you can remove missing values using the Missing Value node. Here you can just enter the value you want to fill in for every missing value in every string column, or, in the second tab, replacement strategies for individual columns. Once there are no missing values anymore, you should not get any fractions in your decision tree.
Kind regards
Alexander

3 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.