I’m building a model using linear regression, simple regression tree, and random forest models to predict housing prices (Ames, Iowa). (I don’t want to change the prediction models as I want to use this as a first limitation).
Another limitation is that I am allowed to use max 3 features to predict SalePrice.
The problem is that I don’t get a RMSE that is lower than $30.000 which is quite significant for houses (is equal to 13.7% mean absolute percentage error) (achieved by random forest predictor).
What are good ways to optimize this model? The random forest model is also a bit overfitted but I couldn’t improve on this side neither.
What are some ‘quick wins’ to optimize random forest predictions?
What I’ve already tried: Normalizing my data (z score or min max), cross validation (not yielding any improvements somehow), optimization loops regarding tree depth and number of trees.
Thank you in advance for your help as I’m quite new to this platform.