time series prediction with linear regression

Hi,
I’m trying to follow this workflow:

related to time-series prediction. Here the csv file that i’m using:

https://gofile.io/?c=9KxWCx

I got to the point that I have to modify the linear regression learner. but I have this error:
ERROR Linear Regression Learner 0: 168: 167 Execute failed: 0 is smaller than, or equal to, the minimum (0)
and I don’t understand if it’s a problem with data formatting.
Here, instead, my workflow:
https://gofile.io/?c=Ha4zl8

Any idea?Thanks

Hi @accat and welcome to the forum -

I took a look at your workflow. The reason you’re getting an error with the linear regression learner is that your input data consists of features with singular values (everything is “IN” across the board). The learner can’t train a model with that information.

What you need to do is back up and look closely at what’s happening in the Prepare Data metanode. For example, right now, you have a Row Filter node there that is only letting features with a value of “IN” get through. But you also should examine the RowID and concatenation operations to make sure they are relevant for your use case. Remember that the example workflow you are modifying was created for a completely different dataset, with its own peculiarities.

At a glance, I think that all you might need in the metanode, in your case, is just a String to Date&Time node, followed by a Sorter. When I do only that, I get a model with an R2 of about 0.24 - not great, but at least it’s predicting something. But you know your data better than I do.

Hope that helps!

4 Likes

Good morning and first of all thanks for the exhaustive answer.
I know that the dataset does not contain many peculiarities and that it is different from that used in the example. I wanted to try to “predict” something, using only the data with “IN” value. A couple of questions:

  • why should i also examine RowID and concatenation operations? What data do they provide me with?
  • you said you used a String to Date & Time and a Sorter. So you haven’t used this component anymore?
  • Does it make sense in your opinion to make a prediction with this data? would others be needed?

Thank you

1 Like

Hi @accat -

Let me address your questions in order. But first, you can’t predict the temperature using values of IN only - not without other features. The only other features you have in your input files are a numeric ID, and a room ID (which never changes).

  • I only mentioned RowID and concatenation because those are holdovers from the example workflow you started with. I think, at a glance, that you don’t actually need them.

  • I removed what I considered to be extraneous nodes from the Prepare Data meta node. String to Date & time and Sorter were the only two I kept.

  • I obviously don’t have the domain knowledge you do, but it seems to me more variables are to needed to train a model that will produce reasonable results.

I’ve attached my edits to your workflow below. Hope it helps clarify.

02_Example_for_Predicting_Time_Series_SF_edit.knwf (2.4 MB)

3 Likes

Hi Scott,
thank you for your workflow.
I tried to run that, and I obtain the following plot:

is it correct?why at some point the prediction line becomes straight?

Thank you for your answers

The model being produced is of little value - it’s certainly not correct. As I mentioned, the model is essentially trying to predict temps using only a single binary variable, with a simple autoregressive approach. Ideally, you would have more features to work with, and you would take measures to identify and deal with trends in your data with methods typical to time series analysis.

That said, if you extend the plot beyond the first 500 observations, you will see that model predictions start to vary again. Here’s the first 6000:

The workflow I posted was mainly to show you how to get a working example, even if the results are quite terrible. But hopefully using your domain knowledge you will be able to find additional ways to improve it.

BTW, if you’re new to time series analysis, we’re teaching an online course on April 6:

4 Likes

Very interesting your course!

One another question. In your model you have used all the values, either “out” and “in”, also if they are temperature readings of an entreprise building room, inside (In) and outside (Out). So you are training a model with values that we cannot compare. Is it true?
If I filter my row, selecting only the row with “In” (or “out”) values, I have:

ERROR Linear Regression Learner 2:168:167 Execute failed: 0 is smaller than, or equal to, the minimum (0)

this error. Why? There are too few values?
Thanks a lot!

I guess if you are wanting to predict only on inside values, you could take the lag of your temp (not the in/out value) and use that as a feature instead?

(The reason you are currently seeing the error is because you are predicting using a variable that only has only one class - In. Even when you lag that variable, it never changes. So there is nothing the model can predict with.)

1 Like

Ok, I follow your advices and these are the results:

I’m using “temp”. I don’t understand one thing though. In both cases are we predicting the temperature? What changes in the two cases?

How would you rate these results? is there any indicator of accuracy?

Thanks

I’m using “temp”. I don’t understand one thing though. In both cases are we predicting the temperature? What changes in the two cases?

Short answer: ultimately I think you need more features to be able to predict one of the temps. There just aren’t enough features to be able to produce a reliably accurate estimate. You can produce a model (as I did above to demonstrate how the nodes work) but unfortunately that doesn’t mean that it will be useful.

1 Like

Ok thank you for the anwer, but Scott, how did you extract an R2 of about 0.24?

And, in the first workflow that you posted, did you predict also the temperature? Or did you predict another values? Because the lag in the first case used the in/out value…

In the original workflow, R2 was generated by the Numeric Scorer node inside the Linear Regression metanode. I set it up to predict temperature, yes. You can open up that metanode to see the nodes inside by double-clicking.

1 Like

Anyway, using temp in lag component, I have an R2 of 0.901, it seems good?

“I set it up to predict temperature, yes. You can open up that metanode to see the nodes inside by double-clicking.”
Ok, but why I see out/in cluster in Linear Regression Learner, in your example?

Sorry for my questions

Thanks

Hi there @accat,

Sound good but it is only one measure of model quality (and can be misleading!). Try with google how to check linear regression model quality.

What do you mean by out/in cluster in Linear Regression Learner?

Br,
Ivan

If you run this workflow, you would see that the clusters in Linear Regression Learner are based on out/in value and not on temperature. I don’t understand why…

Here you can find the file:

IOT-temp.zip (383.5 KB)

I think we are talking in circles a little bit. The initial workflow I uploaded had lagged values of out/in, not temperature - but this was before what you explained the in/out values actually meant.

But as I mentioned before, that workflow was really only intended to demonstrate how to make the nodes work - NOT generate a valid model! (In this case the model results are obviously quite terrible! :slight_smile: ) So this is why your domain knowledge is necessary, as you know the data much better than I do.

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.