Problems with random forest regression and backward elimination

Thepip · September 28, 2016, 2:51pm

Hello

I am trying to set up a QSAR model and am currently in the process of figuring out which descriptors to use and which not. For this I tried backward feature elimination.

When I run the backward feature elimination with linear regression learner and predictor it gives me warnings saying that some columns are missing, but it still carries on whithout them. However, when I do the same with random forest learner and predictor it crashes.

Can someone spread some light on the issue? Are random forests not supposed to work with backwards elimination?

Thanks

nemad · October 2, 2016, 12:37pm

Hi Thepip,

I tried to replicate the problem but without success, which leads me to believe that there is an easy fix for your problem.

Could you try updating your KNIME AP? We had a very similar problem a couple of weeks ago and it should be fixed now.

Please let me know if the update fixed your issue.

Cheers,

nemad

Thepip · October 3, 2016, 10:21am

Yes, I updated the AP and now it works!
Thank you

beginner · October 12, 2016, 12:43pm

To add my 2 cents:

With Random forest feature elimination is not really required. As mentioned in a KNIME blog entry, I would remove highly correlated features and such with low or no variability. Random forest itself will do the rest.

Additonal comment:

If you want to be a real purist, you would have to run the feature elinination for every single iteration of the cross-validation loops. If not, you are leaking information. Of course that would be extremly time consuming and not really needed.

For feature elimination see also:

https://www.knime.org/files/knime_seventechniquesdatadimreduction.pdf

system · April 21, 2023, 9:10pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.