I am training a LSTM network for time series signals classification, and I am tuning the hyperparameters like number of LSTM layers, input layer neurons, learning rate and so on. The problem is that with the same hyperparameters I am getting different train and validation accuracy rates, sometimes they are constant during 100 epochs, but the next time I run it I get a very good training and testing accuracy. I would really appreciate if someone could help me find where the problem is.
@Corey, would you happen to know the answer to this?
@nilooskh, I’m wondering if setting a random seed would help here? As well, if you could provide the data and workflow which causes this issue so we can rule those out as causes of the problem, that would be great! Thanks.
Setting a random seed would, of course, make the problem go away, but wouldn’t address the concern.
This problem is common when training “complicated” models on “small” datasets.
My two suggestions would be:
- Reduce the number of neurons in your model (including those inside the LSTM unit) - this is always a good option if model performance isn’t effected.
- Instead of training your model and evaluating it once on each set of hyper-parameters use cross validation. Your data is a time series though so be careful how you cut it up.
Another option may be to add drop out layers to your model - this is a regulation technique that can help reduce overfitting and help avoid those local minima your model seems to be falling into. I haven’t used them with LSTM layers much though so I don’t want to promise anything.
Hope that’s helpful, I’m also curious how much data and what kind of batch size you’re using while training. Best of luck!!
@Corey @victor_palacios Thank you very much for your responses. I am trying to perform a binary classification on cardiac signals cycles to check whether the cycle is normal. Each cycle has around 2810 samples, and I want to feed the LSTM network with a number of single cycles at a time, and the output should be either 1 or 0.
First question: I doubt if my input shape is correct. Imagine feeding the system with 800 cycles, is the input shape (800, 2810, 1) then?
The LSTM structure is as follows:
model = Sequential()
model.add(Bidirectional(LSTM(32, input_shape = (2810, 1), return_sequences=True)))
opt = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(loss=‘binary_crossentropy’, optimizer=opt, metrics=[‘acc’])
history = model.fit(X_train, Y_train, validation_split=0.1, epochs=40, shuffle=True)
The validation accuracy starts from 0.72 (always) and rather suddenly jumps to almost 1 (without following the trianing accuracy), validation loss is initially more than 2 but then decreases, and the training soon reaches more than 0.99 (like after fourth epoch).
I would be very thankful for your help and insights.!
To better understand this problem, could you attach the dataset and workflow you are running? If the data is company sensitive, can you recreate “fake” data (but similar in statistical properties) so we can see how it runs with your current workflow? It may simply be the case that there is a clear signal in the data identifying the 1s vs the 0s.