Rolling Time Series Predictions - forecasting unique IDs using Gradient Boost

Hi there!

Thanks to @alexanderfillbrunn Rolling Time Series Predictions workflow, I was able to predict new values based on past values.I have also slightly adjusted the workflow to use Gradient Boost predicition (regression) instead of polynomial.

However,

My current use case : finding out future Mbps values based on past values for different cell IDs.

Problem: no matter how long I put the lag, I’m still getting the predicition for the last cellID i have in my input excel sheet.

Attached is the a snapshot of output i’m recieving plus my workflow.

Hi @walhallak ,

Sorry, I was on vacation, otherwise I would have responded earlier. Could you share your workflow? I’d be happy to have a look at it.

Alexander

HI @AlexanderFillbrunn , thanks for replying anyway, hope you had a pleasant vacation!

Here is the workflow, https://hub.knime.com/s/3OoZRHC7TeCfUcP_

Hi,

I think I understand your problem now. As-is, the workflow does not take into account your cell IDs at all. It just takes the whole table and treats it as one big time series. What you actually want is to treat every set of rows with the same cell ID as one time series, and train a model to predict the next values for each cell ID individually. Is that right?

Alexander

1 Like

Hi Alexander,

yes, exactly. I have tried to group dates with same IDs . but I wasn’t able to proceed beyond that.

Wessam

Hi,

Right now, your workflow does not take into account the cell ID. This is why you only get the last one. Please have a look at the attached workflow. I changed the window size to 5 instead of 50, because any cell ID with fewer values than the window size cannot be processed. Even with a value of 5, some cell IDs have too few values and I filtered them out at the beginning.

I use a Group Loop to generate the lag per cell ID before training the model so there is no “bleed over” between cell IDs. Then I also have a Group Loop for the actual prediction so I can predict per cell ID.

I hope this helps!

Alexander

WA_Rolling_time series.knwf (122.4 KB)

2 Likes

Many thanks Alexander, it surely worked! Sorry for the late reply as i’m still doing some troubleshooting to the workflow.

I’m missing the following: the several rows of each cell_ID in my case are hours, this means each 24 rows make a day.

I have data for 4 days , so what i’ve done is making the window size 96 (24 hours x 4 days) instead of 5, filtering out cell_IDs less than the window size.

I predicted 5 days, but the predicted values are not convincing. Am I missing something here? Maybe sum rows per day before processing them?

Here’s the workflow I’m working on, the data file should also be there. Thanks again!

latest_WA_Rolling_time series2.knwf (131.9 KB)

Hello Alexander,

I think I found the solution, we have to sum each 24 rows (hours) per day for each cell_ID before sending the table to the training model just like the snapshot below.

However, the numbers don’t match, to clarify:

  • cell_ID 2001_1 has 24 rows per day –> by using GrouBy node to sum up the targeted values , it gave me 126,738 for 2021-11-13 (snapshot below)
  • this number is different from when you sum them in the original input folder.

2001_1: after GroupBy node the sum is 126,738 (comma is a thousand seperator in germany so this means 126.738k)

2001_1: in the original input file the sum is 27.760

Hello,

I found the problem!!

Knime is treating commas as thousand separators that’s why the outcome of the GroupBy node is 126,728 (thousand).

If you know a way to help me fix this issue , that would be great! otherwise I could write a separate post for it in the forum!

Thanks again!

Wessam

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.