I am trying to set up a QSAR model and am currently in the process of figuring out which descriptors to use and which not. For this I tried backward feature elimination.
When I run the backward feature elimination with linear regression learner and predictor it gives me warnings saying that some columns are missing, but it still carries on whithout them. However, when I do the same with random forest learner and predictor it crashes.
Can someone spread some light on the issue? Are random forests not supposed to work with backwards elimination?
With Random forest feature elimination is not really required. As mentioned in a KNIME blog entry, I would remove highly correlated features and such with low or no variability. Random forest itself will do the rest.
Additonal comment:
If you want to be a real purist, you would have to run the feature elinination for every single iteration of the cross-validation loops. If not, you are leaking information. Of course that would be extremly time consuming and not really needed.