I am having trouble simulating neural ensembles greater of 1000 neurons because the memory usage is just too high when using NengoDL as backend.
I was able to use Nengo Core to simulate the same model by reducing the sampling rate of probes and setting
optimize=False for the Simulator, as suggested in the documentation. This reduced peak memory usage to a few GB.
I have tried applying the same scaling to the Probes when using NengoDL, but the memory consumption is still just too high for my machine to handle, easily running into 100 GB.
Are there any optimisations I could apply to NengoDL that will bring the memory consumption in line with that of Nengo Core, for the same model?
Can you please provide some more information about what you are trying to run? Some more information on the network structure, the size of the data you are trying to train on, etc.
Without seeing code, one thing that comes to mind is that you can try reducing the
minibatch_size to try and fit your model on your hardware.
This forum post has a few other things mentioned for a similar problem that may help you.
Another thing to try, if it’s the probe data in particular that is causing you to run out of memory, is to run your simulation in a loop. I.e., instead of
for i in range(10):
All the probe data is accumulated on the GPU during the run, and then offloaded to the CPU at the end of the run, so by breaking up the run like that you should reduce the peak memory usage. This will particularly be the case if you’re using
probe.sample_every, as due to the way TensorFlow works that downsampling can’t be done while the model is running on the GPU, it is only applied at the end of each
This works perfectly! Thanks!
Of note, was that I was running NengoDL on the CPU and was still getting this issue so it does not seem to be a hardware platform problem but more of a TensorFlow one.
Do you have any plans on maybe integrating this “simulation discretisation” as a functionality via
What I’d really like to do as a more robust fix is to apply the Probe
sample_every argument “live” as the simulation is running, rather than collecting all the data as the simulation runs and then downsampling it. Because of the way TensorFlow works it isn’t trivial to do that, but it should be possible.
That sounds like a godsend. The difference in how the
sample_every parameter dictates behaviour between the two backends in quite jarring. Having the same functional behaviour would be ideal.
Out of curiosity, I compared the discretised and non-discretised simulations to understand how much overhead there is. Surprisingly, simulating the same network with 100 neurons took 29 seconds with 100 steps and 37 seconds with a single step! I would have actually expected the converse.