Problems with random forest regression and backward elimination

Hello

I am trying to set up a QSAR model and am currently in the process of figuring out which descriptors to use and which not. For this I tried backward feature elimination.

When I run the backward feature elimination with linear regression learner and predictor it gives me warnings saying that some columns are missing, but it still carries on whithout them. However, when I do the same with random forest learner and predictor it crashes. 

Can someone spread some light on the issue? Are random forests not supposed to work with backwards elimination?

Thanks

Hi Thepip,

I tried to replicate the problem but without success, which leads me to believe that there is an easy fix for your problem.

Could you try updating your KNIME AP? We had a very similar problem a couple of weeks ago and it should be fixed now.

Please let me know if the update fixed your issue.

Cheers,

nemad

Yes, I updated the AP and now it works!
Thank you

To add my 2 cents:

With Random forest feature elimination is not really required. As mentioned in a KNIME blog entry, I would remove highly correlated features and such with low or no variability. Random forest itself will do the rest.

Additonal comment:

If you want to be a real purist, you would have to run the feature elinination for every single iteration of the cross-validation loops. If not, you are leaking information. Of course that would be extremly time consuming and not really needed.

For feature elimination see also:

https://www.knime.org/files/knime_seventechniquesdatadimreduction.pdf

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.