When do I use stratified sample with the partitioning node?

Hi!

Reviewing the knime example, I saw that in the “example for learning a decision tree”, it is used the option stratified sampling but I don’t know why is that?

If somebody know when is recommend to use this option, it would be really great!

This is a statistics question, not a KNIME-specific question.

You’d use stratified sampling if your original dataset can be divided into subpopulations and you want each subpopulation to be appropriately represented in your final partitioned dataset.

2 Likes

Thank you so much for your help.

1 Like

Hello!

True but considering this is related to KNIME workflow example I find this question ok. Also we solved and addressed bunch of DB and other non-KNIME related problems and this is far more related to KNIME purpose IMHO :slight_smile:

Br,
Ivan