I’m quite new to KNIME and I’d like to use an LSTM NN for multivariate time series forecasting, however, I’m not sure how to convert my data for this.
Basically I have an excel table with variables as columns and the time steps as rows, and I would like to forecast the number of sales for the next few time steps.
Now, if I understood correctly the LSTM takes the input [time, input_dim], so would that time be the steps I’d like to forecast (e.g. 4 weeks, 1 month etc.) or the time steps I use for training? And input_dim would be the number of variables I have?
Secondly, I’m not sure how to normalize my data. I would like to use e.g. number of sales, temperature, day of the week etc., so I thought I needed to normalize that to values between 0 and 1.
However, if I just feed that normalized data into to the Keras Network Learner node, I get an error because it is not in the integer data format. If I transform the data to integer, it’s not really useful.
I’m guessing I need to convert my data in a different way but I’m a bit lost as to how…
Any help would be greatly appreciated.
Thank you in advance.
The LSTM takes an input of the shape
(time, input_dim). The
time is the number of timesteps you want to give as a model input while
input_dim is the number of features per timestep (as you said correctly).
This means that every row in your training table has to contain a whole time series of the length
time and the value (/values) you want to predict for this time series. (
time * input_dim columns). Note that you can generate many of these small training time series examples from your whole time series.
Your way of normalizing the data sounds fine to me. Make sure to select the right converter in the ‘Keras Network Learner’ node. Probably ‘From Number (double)’ or ‘From Collection of Number (double)’ (if you combine your data into a collection column) and not ‘From Collection of Number (integer) to One-Hot Tensor’. Also make sure that your input layer has set ‘Float 32’ as data type.
Thank you a lot for your help!
That helped clear things up for me.
One additional question, again, I’m not sure if I’m transforming the data correctly here…
I was able to get my network working for 1 time step, however when I tried to use it for more than one (5 in my case), I received an error message “For non-scalar data values, only single column selection is supported.”.
I used the Column Lag and Create Collection Column node to transform the data, so the input data is now “List (Collection of: Number (double)”.
I’m again not sure if this was correct, but I didn’t receive an error about the input shape (I sometimes received an error when the input elements weren’t matching the input neurons).
So I guess my input shape was not completely wrong? (5, 14) in this case, because I would like to use 5 time steps and 14 input dimensions.
Again, thank you in advance.
if you use the “From Collection” converter the whole input data has to be in one collection. You could split the collection columns and use the “From Number (double)” converter with all columns selected.
I have to admit that for two dimensional inputs (which aren’t images) it’s kind of ugly right now. We are working on making that better.
ah, I see, thank you very much :).
That’s okay, it’s great to be working with KNIME anyway.