RNN - LSTM for time series prediction

Dear Knimers,
I have preprocessed (into Knime) a dataset (~1160 rows; restricted to my region and grouped by day) with the moving averages (MA) of three input variables describing the daily local Covid data: number of cases (NrCases); number of hospitalizations (NrHosps); and number of deaths (NrDeaths). I expected that these variables could serve as independent ones in my predictions. At the same time, I have another file with the sum of general hospital expenses (Sum(HospsExpenses)), not just the ones related to COVID-19, equally in a daily granularity. These latter would serve as my dependent variable (and its predicted (classes of) values). Both files present data taken by their 28d MA, (which rendered the smallest range of daily variation).

First, I clustered (using k-Means) my financial data, which generated an optimization with three clusters. I implemented this task to change my problem into a supervised prediction task. So, I got three classes for my future predictions (the expected output).

Next, I joined both files into one Knime table, to investigate the financial weight of Covid in the regional hospital budget. These data are in the attached CSV file.
Covid data for TS Prediction with NN algorithms.csv (37.6 KB)

The only seasonality I identified (in the total Sum of Hosps Expenses of six investigated consecutive years) was a yearly one, with two decreases in January and in March. But much higher variations were seen during the Covid period. Thus, I implemented no seasonality correction. Below is attached an image with a graph of such variations over time.

Besides, I investigated the correlation rank (Spearman’s), and got the following association:

image.png

This table suggests to me that:

a) NrHosps and NrDeaths are so closely correlated (rho = 0.981) that the former could be taken as a proxy variable for the latter;

b) the three input variables are moderately associated with the output variable, so their predictive power is also expected not to be so good.

I need (for academic purposes) to build a workflow including at least two predictive neural networks (NN) to compare these time series and to predict future expenses using past values on these three variables. But I’m a dentist, not a programmer (nor a mathematician), so I could not properly configure the parameters of the intended RNN - LSTM, because of a simple lack of knowledge (I don’t understand what are those parameters). I read the post from @Kathrin Melcher:

and I expected to build a similar codeless (or low code) solution, possibly using the Keras integration…

Could someone help me with this issue?

Wish you all the best.

Rogério.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.