number of features in model

Hi, I am wondering if there is typically a rule of thumb to follow for dataset size vs features used. For example, in a dataset size of about 6000, should the number of features be above a minimum number of features?

I guess no such rule of thumb . 6000 is number of observation and can have different number of features. However what is important is to identify the ‘feature importance’ . Need to identify which features carry more weightage so to keep them for building your model .

1 Like

In the book Elements of Statistical Learning by Hastie, Tibshirani, and Friedman, the recommended starting values for the number of features, m, for Random Forests are m=p/3 for regression problems and m=√p for for classification problems.

However, the best values for the number of features depends on the problem and this needs to be tuned.
You need to take into account their relative importance, the degree of correlation between features, the degree to which they reflect reality, etc.


This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.