Hi, I’m running a random forest (regression) and seem to have found a good model. However, I feel that the model could be improved if the features all had a more gaussian distribution. About 14 of my 20 features actually do not have a very good gaussian distribution. I was just wondering if there is a specific way you recommend I should go about this process. I have tried a few things such as taking the log and ln of features, but that did not solve the issue. I also tried taking those 14 features in PCA and this made my model perform worse.
Thanks for your question. What is your assumption based on? The distribution of your data is what it is. Additionally, this blog post about common pitfalls when using a random forest regressor might also be helpful in your case: https://medium.com/turo-engineering/how-not-to-use-random-forest-265a19a68576.
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.