I am trying to fit a regression model in knime. I have several questions for you. I am very confused.
My model have independent values of:
-Vehicle Type: Categorical(5 category)
-Company Type: Categorical(6 category)
Dependent value is:
-Unloading time per volume unit (Numerical, EXPONENTİAL Distribution)
I may use random forest etc. but my priority is getting a polynomial regression with coefficients. And I have these questions for you
1-I used X-partitioner for partition, and at the end I want to see model coefficients of aggregated data. But I think if I right click on Polynomial Regression Learner node, it only shows the last partition coefficients, not the aggregated model. How can I get the coefficients of the aggregated model, is it possible?
2-My dependent variable is distributed with an exponential distribution. Shall I transform it using LOG transformation, the MATH modules are for that task in the image.
3- Also I have problems with parameter optimization and x-partitioner. I want to optimize parameters of random forest but x-partitioner recursion interferes with loop I think. It always says “Wrong Loop Start Node Connected” or “Can’t merge flowvariable stacks(likely a loop problem)” according to orientation of variable ports.
4-) Moreover, even if the 3rd one works, there is a big question. Parameter optimization must increase variables after x-partition complete 10 partitions.
In summary, after 10th partition ends, parameter optimization node shall be triggered.
Actually, x-aggregator sends it back to beginnning each time, and do not let flow continue till last partition I think. But optimization start node is a problem, where and how shall I place it?
Thanks for your help, have a nice day
Thanks for your help
Thank you Nemad for your detailed answer, it is a good step forward for me. I am still learning, trying try everything possible about Knime, and get stuck mostly
3rd-4th questions are perfectly solved by your answer. I learned something new and very happy right now.
2nd answer is also clear here.
I have something for 4th one, an extension, lets say 4-b
4-b) Final result is showing me the last grid parameters. But I want to see the “optimum parameter set” results. What shall I do?
Get “Best Parameters” from optimization loop end somehow, then again somehow use these in another random forest node as input? How? (1st approach)
Or is there a way to record all iterations, filter the best one among them? (2nd approach)
Also for the 1st question:
1-) I actually trying to use k-fold as a basic precaution against overfitting. I just want to use k-fold with polynomial regression not a bagging model, but I can’t get the final coefficients for full data. It only shows the coefficients of kth iteration when i right click on learner node.
It is a similar problem like 4-b. Isn’t there a way of recording coefficients for each k iteration, then may get average for each coefficient? (It sounds statistically silly, is it possible by the way?)
I am glad my answer did help you out =)
Regarding your additional question, I am not entirely sure what you mean. The first output of the Parameter Optimization Loop End provides the best parameter setting i.e. the parameter setting that achieved the best performance. The second output contains all parameter settings and their achieved values.
The screenshot below shows how you can automatically retrain the model with the best parameter configuration.
If you want, you can build a loop that does what you are describing but I believe there is a slight misunderstanding regarding the purpose of cross-validation. The idea of cross-validation is to give a robust estimate of model performance for a specific set of hyperparameters (e.g. the number of trees in a random forest).
Once you found the best hyperparameters, it is common to retrain the model on the full dataset.
This is because the models inside of the cross-validation are trained on only a subset of the data and since more data usually improves model quality, it is sensible to retrain the model with the full dataset once you find a good hyperparameter configuration.
More or less the same thing applies for parameter optimization, especially if you combine it with cross-validation.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.