i’m creating some case studies to show in my master thesis, whether KNIME can be used by controllers without a lot of IT- or statistic-knowledge.
In this context I built a workflow using the Capital Bikeshare Data to train a regression tree learner. I want to predict the casual-users in connection with the weather und calendar features. My statistics of the results are bad. Now I’m wondering if the predefined features are irrelevant (but Correlation analysis showed some Correlation) or if I made a mistake in the workflow. (Statistics is 7 semster ago ;-)) …so thankful for your help! Franzi
It may become a bit better with a cross-validation loop instead of a single partitioning… If that doesn’t work maybe you can try a feature selection/elimination loop on one partition and score on the other partition.
Thanks for your help. Both didn’t work better.
sound like a really interesting topic for a master thesis.
I’m not really familiar with the Capital Bikeshare dataset, but could it be that your problem is a classification problem (two classes: casual-rider and non-casual rider) instead of a regression problem?
If yes, you could use the Decision Tree Learner and Predictor nodes instead of the regression tree nodes.
Also, you could try using the whole dataset both for the learner and the predictor. The score that you get that way is subject to overfitting and perhaps not realistic, but you can regard it as a maximum score that the dataset cannot possibly exceed with your chosen ML method.
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.