Large MAPE gap between LSTM Optimizers

I used MAPE measurement to compare Adam, Adagrad and SGD. There is a large gap between the results, in stacked LSTM with network size 256, as following: MAPE Adam: 4.43 MAPE Adagrad: 86.30 MAPE SGD: 70.07

Is it normal to have such large difference?

Hi @Narsis,

A difference this large is not common but can happen depending on the structure of your data and what predictions you’re getting. Maybe there’s also something a bit off in your workflow…

If you’re still experiencing this gap, could you please share your workflow with us, along with enough data, so that we can further examine it?