What should I check to improve my regression model?

zizoo · April 13, 2020, 9:10pm

I built a regression model and the R squared is 0.48.
I put below a correlation matrix where the last row is the dependant variable.
What should I tune to improve my model?

AlexanderFillbrunn · April 14, 2020, 2:47pm

Hi,
what model are you building exactly? Is it a simple linear regression model? The features that are highly correlated with your dependent variable (blue boxes in the last column) are of course very useful. Have you tried training a model only on those?
Kind regards,
Alexander

zizoo · April 14, 2020, 10:20pm

Hi @AlexanderFillbrunn,

I am trying to build a regression model from my dataset. I tried almost all the techniques available in Knine such as simple linear regression, random forests, logistic regression, XGboost tree, polynomial regression…

When I apply cross-validation, linear regression gives output:
jjj.xlsx (11.3 KB)

It works with random forests and H2O Gradient Boosting. Initially the R2 doesn’t not exceed 0.43
Then, I removed one of independant variable Xa and I applied the following math modification to the dependant variable Y:
-log(Y/(Xa)2).
I also removed some samples with extreme Y. I have to note that this variable Xa is correlated to some of the other independant variables.
Miraculously, R2 for random forests jumped to 0.75. If I apply PCA for the independant variables and cross-validation, R2 is almost 0.99 which is too good to be true.
This big jump from very bad to very good is quiet worrying.
Are there some tests like p-value (I am not sure how to extract it in Knime) that let me trust the model and its R2?

Thanks,

system · October 14, 2020, 10:20am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.