Hi @fladventurerob, and welcome to the Nengo forums!
To answer your questions:
In terms of training behaviour (and constraints) the LMU network can be considered a standard RNN, so you may observe the same batch size-to-over generalization effects you see in other RNN / ANN architectures. Unfortunately, as with other RNN / ANN architectures, determining the value of the hyperparameters that work best with your network is kind of a black art (and a lot of trial and error) and depends on a lot of variables, like your specific network architecture, the data you are using (e.g., whether or not your data adequately captures the dynamics or features you want to learn), and so forth. A tip one of our ML engineers suggested is to use a “warmup phase” as part of your learning rate schedule, as described in this paper.
If you are using NengoDL or KerasLMU to create and run your LMU network, then it should have no issues utilizing the GPU. Both NengoDL and KerasLMU run on top of TensorFlow (i.e., everything should compile down to TF to run), so as a first step, I’d recommend confirming that standard TensorFlow models utilize the GPU in the Python environment that you are running your LMU code from. After that, if you find that the CPU is still faster than the GPU for your LMU network, one of our devs suggested using the
"raw" mode for the
conv_mode parameter when creating your LMU layer.