Just wanted to share a couple of new components I have built to simplify feature elimination in a regression. They calculate Variance Inflation Factor (VIF) and apply backward feature elimination on a set threshold of VIF. They are available on KNIME Hub:
Some background on Multicollinearity and VIF values:
Multicollinearity occurs when two or more columns are correlated among each other and provide redundant information when jointly considered as predictors of a model. VIF is used to diagnose the extent of multicollinearity within predictors of a model. For instance, a VIF of 3 tells us that the variance of a column is 3 times larger than it would be if that column was fully uncorrelated with all other predictors. As a rule of thumb, columns with VIF higher than 5 should be removed as predictors of a model in order to reduce dimensionality while minimizing collinearity (James et al., 2014). If two variables display a similar VIF factor as they are correlated to each other, you should eliminate first the one that has less business meaning in the context of your exercise.
All feedback and inputs are welcome,
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2014. An Introduction to Statistical Learning: With Applications in R. Springer Publishing Company, Incorporated.