Autocorrelation and Seasonality in Time Series

Dear Knimers,
I have two time series (TS), which were uploaded here in the two CSV files (a) and (b):
MA-SumExps-jan2017-jan2020.csv (1.1 KB)
MA-SumExps-mai2020-jun2023.csv (1.2 KB)
Indeed, it is a matter of Interrupted Time Series (ITS), as can be deduced from their file names.
I need to demonstrate that the differences in both trends are statistically significant. The angular coefficient of the second is 4.75 times higher than the first.
Before splitting the original TS into two parts, I got the following graph (for the whole period):

And after splitting it, I got these two graphs:

  1. for the period “pre-COVID”
  2. for the period “post-COVID onset”

I ask for your help with some tasks:

  1. to (separately) investigate if there is autocorrelation in each TS; and
  2. to inspect (and if possible, to remove) the seasonality in these two series, but without using resources (e.g., with Knime “Components”) that require Python or R installation.
    Can someone help me to accomplish both tasks?

Thanks for all the help received.

For the auto-correlation, you could use the Lag Column node to lag values and then calculate the auto-correlation with Math Formula node for the inner terms and GroupBy with sum aggregation for the sums based on the formulas given here.
For the seasonality, you could simply calculate the difference between each value and the value 12 months before, again using the Lag Column node with a lag interval of 12 and a Math Formula node for the calculation. Make sure to subtract the trend from your data before doing the seasonality removal, then add the trend again. For that you can add a new column “x” that is simply the row index, then train a linear regression on the data and apply a Regression Predictor to the same data. Then use Math Formula to calculate the difference between your value and the predicted value.
Kind regards,

1 Like