Batch size effect on LMU & GPU question

As to the batch size:
I am interested in getting feedback on how batch size effects the LMU. Specifically, a large batch size can have downsides, (e.g. Effect of batch size on training dynamics | by Kevin Shen | Mini Distill | Medium) but I have a lot of data to process and I’m trying to weigh the pros/cons.

As to the GPU:
In processing time series data, is there an effective way to utilize a GPU with the parallel LMU? (I have the P100 GPU). So far it seemed that the CPU only is faster. Maybe I am missing something?

Thanks very much!

Hi @fladventurerob, and welcome to the Nengo forums! :smiley:

To answer your questions:

In terms of training behaviour (and constraints) the LMU network can be considered a standard RNN, so you may observe the same batch size-to-over generalization effects you see in other RNN / ANN architectures. Unfortunately, as with other RNN / ANN architectures, determining the value of the hyperparameters that work best with your network is kind of a black art (and a lot of trial and error) and depends on a lot of variables, like your specific network architecture, the data you are using (e.g., whether or not your data adequately captures the dynamics or features you want to learn), and so forth. A tip one of our ML engineers suggested is to use a “warmup phase” as part of your learning rate schedule, as described in this paper.

If you are using NengoDL or KerasLMU to create and run your LMU network, then it should have no issues utilizing the GPU. Both NengoDL and KerasLMU run on top of TensorFlow (i.e., everything should compile down to TF to run), so as a first step, I’d recommend confirming that standard TensorFlow models utilize the GPU in the Python environment that you are running your LMU code from. After that, if you find that the CPU is still faster than the GPU for your LMU network, one of our devs suggested using the "raw" mode for the conv_mode parameter when creating your LMU layer.

1 Like