Performance comparision of LSTM and LMU


It is mentioned here that the units and order were chosen to keep the ‘number of internal variables’ comparable to the models in the paper. What exactly does this refer to? It is clearly not the total number of trainable parameters (it turns out to be 101,771 for LMU compared to 41,810 for the LSTM described in the paper with 100 RNN units). Kindly shed some light on the parameters that were used to ensure that the two models were comparable.

1 Like