ARIMA/SARIMA possible?

JakobJosef · August 7, 2024, 4:46am

Dear community,
I am struggling with Time-Series-Forecast and since i was not able to create a proper model so far I thought I could ask you for initial assessment if it is possible anyway.

This is how my timeseries looks like, I would like to create an forecast for the next 12 months. Each bar represents 1 month.

Of course I tried to cut the numeric outliers:

For p,d,q i tried using Parameter Optimization Loop (Range: 0 to 5 for each) to find the best combination of these 3.

This is how the heart of my model looks like:

What is your opinion, is (S)ARIMA even possible with my dataset or am I going to fail anyway? Is it not enough “stationary”?

Best
Jakob
CEDSA_V3.knwf (39.9 KB)

JakobJosef · August 7, 2024, 4:48am

Addition: Technically everything is working. Problem is the R^2. Its always just around 0.

Corey · August 7, 2024, 3:52pm

Hey @JakobJosef, taking a look at your workflow now.

Is it possible to share the data? Typically when getting ready to train an (S)ARIMA model I explore the ACF and PACF plots to decide on parameters to try.
an R2 of 0 probably means you’re getting a constant value prediction. In some cases this could still be the best forecast - if all those deviations from the average monthly value are random. However I don’t usually use R2 when evaluating forecasts - it’s really a goodness of fit metric. Other things in the scorer like Mean Absolute Error or Mean Percentage Error are often better here as they give you a direct relation to how close the forecast and true values actually are to each other.

If you can’t share the data I can give some generic advice on interpreting the ACF and PACF plots as well.

JakobJosef · August 8, 2024, 7:42am

ARIMA - Knime.xlsx (11.6 KB)

Hey @Corey ,
wow, I am so excited and honored you are answering to my post. I feel like i have watched all your (S)ARIMA/TimeSeries-Analysis-Videos on YouTube for at least 3 times. Thank you so much for offering your help.

Of course I can share the data - it is attached. Good to know, that R2 is in this case maybe not the right evaluation-parameter. The values I got in the workflow above were not that bad, only R2. So I ll try to evaluate them with MAE and MAPE.

Also I was concerned about the stationarity of my dataset. There are some numeric outliers, thats why I tried to cut them off. And I asked ChatGPT to make the Dickey-Fuller-Test regarding stationarity. ChatGPT told me that this time series is stationary:

Looking so forward to your answer. Thank you so much that you share your knowledge here in the forum as well as on youtube etc.

Best
Jakob

Corey · August 8, 2024, 5:25pm

I took a look at the data set and several visualizations. I put the workflow here along with some notes on the visual analysis and a few types of forecasts.

A few summary points for anyone else looking through this thread that doesn’t download the workflow though:

The ACF Plot

Above is the ACF (Auto Correlation Function) plot, it shows how the time series correlates with previous values of itself. We don’t see any spikes outside of the shaded region at all so I immediately suspect there’s not a ton to pick up with a SARIMA model.
I also check some conditional box plots but I’ll leave that in the workflow if you want to see those.

Comparing Forecasts

Above is a line plot comparing true future values to 3 types of forecasts: a mean value forecast (the flat line) the monthly mean value forecast, and the SARIMA forecast.

I show the MAE, MSD, and MAPE, error metrics below that as I think they’re usually the most relevant. MAE is great because it shows you in real terms how far off the forecast is, its units are the same as the time series (in this case $). MAPE is the same but in percentage terms, this can be easier to understand how far off it is if you’re not familiar with the usual values of the series. And also there is MSD, this is a signal of bias, all 3 models have a positive MSD, this means mostly the forecasts are over predicting. This could be a sign of an underperforming past 12 months or some issue with training.

After looking at the views in the beginning I suspected a Monthly mean value forecast might be the best performing here, but it looks like a flat mean value forecast is wining with 16% error.

Next steps here if you wanted to try and improve a forecast on a problem like this would be to investigate any potential time series that correlate with yours. For example market share cap in your industry - growth or decline there could correlate to sales.

JakobJosef · August 13, 2024, 12:37pm

Hi Corey,
thank you so much for your effort. It helps me a lot to see how you would solve the problem. Especially the parts in “Views and Notes” and the comparison between the 3 types of forecast.

Thanks to the Parameter Optimization Loop i was able to reduce MAE to 470, but still there is just a small advantage with SARIMA in comparison to mean-forecast.

Thanks again - I’m looking forward to the next content of yours.

BEst
Jakob

I ll mark your post in 2 weeks as “solved” so there is more time just in case somebody wants to add somthing to this post.

system · August 27, 2024, 6:47am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.