I have very basic newbie question. I have created my very first time series prediction model using Linear Regression based on past data. It shows quite OK accuracy based on comparison with the past data. BUT the question is how can I use it to actually predict something? Could you please provide me with an example of how can I use the Regression Prediction after it has been trained? I’m playing now with the example from KNIME library knime://EXAMPLES/04_Analytics/07_Time_Series/02_Example_for_Predicting_Time_Series.
Could you please point me out to such a workflow?
Thanks in advance,
I’m a few hours digging into this topic further and I think that I can ask now better question. My question is about the deployment phase. I trained my model, I saved it and I can load it back. My next step is deployment. Is there any simple example how to deploy trained linear regression predictor to actually predict some time series data? Could you please provide me with such an example?
Hi @AndrzejL -
If I’m understanding you correctly, you pretty much just need to use the Model Reader node in conjunction with the Regression Predictor node.
You can find an example in the anomaly detection context on our examples server - see the path below. This is probably more complicated than what you need, but hopefully gives you an idea.
If you are serious about deployment and provided you cannot rely on an IT department for that purpose:
- DIY: within KNIME, you should have a look at the Model Factory example and whitepaper. This allows for production-grade model learning and, with a bit of tweaking, also deployment;
- PMML: you can export your model to PMML and perform the prediction in any other PMML compatible environment.
The model factory may look like overkill at first. However, once you’ve got it up and running, it is really good deployment solution.
I’m looking for a simple “Hello world” with linear regression. I trained my model to predict the number of a day in a week. The trained model seems to do the job well. But I have no clue how can I ask it what day numbers will be next week i.e. on 19’th of November 1 because it’s Monday, on 20’th of November 2 because it’s Tuesday and so on for the 7 days of the next week until the 25’th of November i.e. Sunday when it should tell me 7.
I don’t know:
How to build such a workflow?
What kind of data should I feed into it to tell me that?
I would appreciate any help here. Or is my understanding of how linear regression prediction looks like totally wrong?
That is what my trained model says:
And this is how the simplified training workflow looks like. This is a simplification of the linear regression workflow delivered with KNIME. Very simple.
Ha! I just made another great step for Andrzej that is a little step for KNIME community I have provided some bogus data for the next week into my workflow and the model seems to correctly point out what numbers of the day of the week does it expect! Cooool!!! Of course I do not train it anymore. It uses the model that has been saved when I trained it with correct input.
Here is how my input CSV file with incorrect day numbers looks like:
Date,Day of the week
As you can see the days from the 19’th up to the 25’th of November contain incorrect data. The model hasn’t been fooled by that which is GREAT! But my next question is. Do I need to feed my workflow with bogus data in order to find out what day of the week does it predict? Or is there a way to ask it for prediction without providing the bogus input?
BTW. Here is what my model says after I fed it with some random numbers for days of the week.
Actually, at first glance, your problem looks like a classification problem and not like a linear regression problem. The latter is typically used for continuous outcome variables. That is not to say that a linear regression method would not also solve your problem, for some methods like logit regression can be accommodated for both tasks.
To take this even one step further, if the input variable (predictor) is a date, the problem you’ve defined here above is not even a classification “problem”: it can be solved by a simple lookup table or a function (e.g Excel’s weekday function). In other words, you should be able to see 100% accuracy for this task, provided the input is a date. In case of other types of inputs, I would look into classification methods first.
Also have a look into the first edition of this book for further details: https://www.manning.com/books/practical-data-science-with-r-second-edition