I just started to use knime and am trying to build a predictive analytics model using a Linear Regression Learner in order to predict the sale price of houses using various variables.
However I want to use a maximum of 3 different variables that are fed into the learner in order to predict the price.
As a performance score I chose to try to get the RSME under a certain level.
However after cleaning up the data and analyzing it, I just cannot reach a satisfying level of RSME for the test data.
Is there any way I can easily break the features down in order to find the optimal feature combination?
Thank you very much!
Hi @nimoba and welcome KNIME Forum
I created his worrkflow for anotherForum topic. By using the Python packages “Itertools” this workflow makes it possible to loop over all all possible combinations of columns (features) and do some math or … In this workflow a model is trained on the different combinations of features in the Boston Housing dataset. See Control variables in a loop on the KNIME Hub.
Hi there is a feature selection loop build in into KNIME you can use to select the best feature combination. Of course Hans solution is also an option.
Thanks for sharing Hans!
Hey Hans and Daniel,
Thank you very much for your answers!
I tried both approaches and improved my result drastically. I could not get to my desired benchmark tho as it seems the date I have collected is jsut not sufficient yet.
Nontheless thank you very much for your help. I appreciate it!!
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.