Housing Prices Prediction


I’m building a model using linear regression, simple regression tree, and random forest models to predict housing prices (Ames, Iowa). (I don’t want to change the prediction models as I want to use this as a first limitation).

Another limitation is that I am allowed to use max 3 features to predict SalePrice.

The problem is that I don’t get a RMSE that is lower than $30.000 which is quite significant for houses (is equal to 13.7% mean absolute percentage error) (achieved by random forest predictor).

What are good ways to optimize this model? The random forest model is also a bit overfitted but I couldn’t improve on this side neither.

What are some ‘quick wins’ to optimize random forest predictions?

What I’ve already tried: Normalizing my data (z score or min max), cross validation (not yielding any improvements somehow), optimization loops regarding tree depth and number of trees.

Thank you in advance for your help as I’m quite new to this platform.

3 is not much. What about different features?


Hi @StfnS

Indeed 3 is not so much. But anyway see this workflow from the KNIME Hub . By using the Python packages “Itertools” this workflow makes it possible to loop over all all possible combinations of columns (you can make a selection, in this flow =3) to train a model (simple regression)
In this workflow a model is trained on the different combinations of features in the Boston Housing dataset.

gr. Hans

thanks for the idea. Since I’m not that advanced in Knime yet, would it be an option for you to upload your model?

That way, I can learn faster how to adjust my current model.

Yes, indeed. Is there a way to automatically find the best 3 combinations of features that minimizes RMSE using feature selection? If yes, how?

Hi @StfnS The model is in the Metanode “Model and Scors” . the 3rd from the right. To open it, just double click . There is a Random Forest Learner (Regression)
Screenshot from 2020-11-23 21-25-51
You can replace the Learner and the Predictor. I think adjusting some parameters within the Learner needs some attention,.
gr. Hans

Hi @HansS .

Sorry, I was referring to your complete workflow (with the python script, etc.). I think it is called workflow (not model).

Would be insanely helpful to get my hands on it!

1 Like

Ah @StfnS, you can drag and drop the wf directly from the hub (just follow the link in my previous post). Or just download it from this link Control variables in a loop.knwf (485.7 KB) .
gr. Hans

ah, thank you! Somehow missed the first link :slight_smile:

1 Like

Question would be what to select from three variables …

One thing you could try is employ a tool like vtreat (or featuretools in python) and see if this could make some use of some transformations. But typically this is used to reduce dimensions of you have ‘too much’ data.

As it happens the example used is also about house prices. You could also try and limit the automl set of models to the ones you want to use and see with what H2O.ai does come up.