I am trying to predict the values of a parameter which has nonlinear dependency from other 4 parameters. For this purpose I'm using polynomial regression, for evaluating the model cross validation is used. After several attempts I've put the maximum polynomial degree to 7, because in this case the error rate is relatively small. So, the question is: any suggestion about the different ways to improve the performance of the model?
You can add further (derived) columns to the model (like exp, ln, cos, atan, sgn, ... or combination of them). Unfortunately you can only add manually with the Math Formula or the Java Snippet nodes or with some scripting. (For automatic you might try to implement something like A hybrid approach to feature selection and generation using an evolutionary algorithm.)
Hope this helps, gabor
Thanks for reply! You mean to add columns in my training data? For example, cos or ln of the values in seperate column? Also, is there a way to see the overall error rate of the model, because now I can see the error rate for each fold.
Yes, I meant to add new columns with for example ln(1+$Col1$) and similar. If these are picked for inclusion for the polynomial regression it means they could be used to better estimate the value you want. As you can compute these columns with 0 error, you can use the same error measures as before but on more columns. (But if your data is random, not following a formula or following a formula-kind not included this will not work well. I think there are articles about the limitations and best practices on the nutonian site (maybe even specific for your use cases) for this approach.)
yes gad! polynomial degree 7 ! Anything >3 almost always overfits the model. You need better descriptors/more observations.
Agre with aborg that If you want to try many different functions applied to your independent variables then try creativemachines.cornell.edu/eureqa www.nutonian.com/eureqa/. It probably also has a knime node. A trial version is available.
InsilicoConsulting I can't say the same, because in my case validation error rate is not that high when the degree is 7.