Hi to all ; i am new to this forum. I have this problem : I want to classificate about 92000 examples with 15 attribute and I chose the Decision Tree. I would know if there is a method , like Cross Validation ecc. , to choose the best value of min number records per node
Thanks again. An answer : I created this workflow and trained Decision Tree with 80% of Data set (Training Set) and tested it with Test Set (the remaining data). I have some doubts I hope you'll help me to remove: the use of Partitioning node is also called Hold Out? In which cases I can use Cross Validation?
In the second image there's the Confusion Matrix from Scorer , including the column of reject option. I t si fair to say that I calculate the Error without considering the column 'rejected'?
Hold out is one type of crossvalidation. You can read more here http://www.cs.cmu.edu/~schneide/tut5/node42.html
We also have dedicated CrossValidation nodes. However, CrossValidation is a Model Validation technique, it is not used for parameter optimization.
Hm, your second question mainly depends on if this value (rejected) is of any interest for you. But basically the scorer takes all values into account for its accuracy.
If I don't want to use the Parameter Optimization nodes , it is correct to use Cross Validation , increasing from time to time the 'Min number records per node' and then making the right observations?
No this is methodological not correct. If you do crossvalidation, you leave one subset out for training, which is than used for testing. If you in addition change a parameter, you don't know if the improved quality is based on the parameter or just a random effect of this specific Training-Test-Set Combination.