[Nengo-DL]: "scale_firing_rates=1" uses more GPU memory?

zerone · August 24, 2020, 1:20am

This question is a bit ill formed. Actually in my project’s code I am using multiple values for scale_firing_rates, ranging from 1 to 600. I have found that using scale_firing_rates=1 results in memory error, while using scale_firing_rates=100 or so… doesn’t.

The structure of my code is similar to the example mentioned here, except that I am using 3D CNNs. I created a smaller sample program identical to the one in my project, but failed to reproduce it there. However, as mentioned above, if I set scale_firing_rates=1 in my project, it runs out of GPU memory every time, but with scale_firing_rates=100 or scale_firing_rates=200 etc. it runs smoothly.

I have checked my project’s code multiple number of times and could’t find a foul play of scale_firing_rates, except that it’s simply passed to the nengo_dl.Converter(...). Please let me know if you are aware of it. I will be happy to provide any other required information.

arvoelke · August 24, 2020, 1:47am

Have you tried the newest release of NengoDL (3.3.0) together with TF 2.3.0? There have been a few major changes/updates and so it would be good to know if this is something new or if it’s indirectly been resolved. I haven’t heard of this issue and can’t think of why this would be the case. Perhaps you could try disabling some of the output probes (in particular those probing spike activity) to see if they are a contributing factor? Thanks!

zerone · August 24, 2020, 3:32am

Hello @arvoelke! Thanks for your response. I tested in both the settings, (1): Nengo-DL 3.2.0 and TF 2.2.0 and (2): Nengo-DL 3.3.0 and TF 2.3.0. The issue remains the same in both the settings. And I have already disabled output probe data collection. My code works, but as I said… fails for low scale_firing_rates with memory error. Hence, I am not stuck, but would like to understand where things can go wrong?

arvoelke · August 24, 2020, 3:39am

Thanks for trying that out. This indeed seems very strange to me, and I’d like to help you get to the bottom of this. In your first post you linked the example you based your code off, but would you be able to please take the code you are currently testing, strip it down to the bare minimum required to reproduce this effect, and then post it here? This would help us narrow things down, and make sure we are seeing the same issue and testing the same thing. Thanks!

zerone · August 24, 2020, 3:33pm

Hello @arvoelke, yes… I completely get this point that we need to have a stripped version of my code for progressing ahead. As I mentioned earlier, I tried with a simple version of my 3D CNN code with 3D MNIST data, but failed to reproduce this issue there. My project is very much data intensive… uses GBs of data, hence it would be required to share the same data to reproduce the issue. But first let me try to create a stripped version of my project’s code.

xchoo · August 25, 2020, 2:20am

In addition to creating a minimal model that exhibits this issue, here are some other suggestions I have to help you pin down what is causing this issue to happen:

When does the model run out of memory? Is it during the build, training or evaluation phase of the simulation? Knowing this will help pinpoint exactly where in the code this is happening.
Does this issue happen with differently sized datasets (but with the same network)?
Does this issue happen if you reduce the number of training epochs (note that this is only necessary if this issue is occurring during training)
Does this issue happen if you change the parameters of the model (adding / removing neurons, connections, etc.)
If you are using probes, does this issue happen if you disable all of the probes?

zerone · August 25, 2020, 9:57pm

Hello @xchoo, Thank you for your suggestions. I will try to answer all your questions to the very best. But prior to that I am mentioning a small investigation I did. With a fixed batch size and n_timesteps=10 (i.e. the data presentation time to the network), I experimented with scale_firing_rates = 1, 100, and 600. With 600 and 100, the network consumed same amount of GPU memory, whereas with 1, it consumed more GPU RAM which is very counterintuitive. GPU RAM consumption shouldn’t depend upon the scale_firing_rates. Given that it consumed same amount of GPU RAM for 600 and 100, the possibility of any bug due to scale_firing_rates resulting in increase/decrease of input data is low or can be ruled out.

I am not entirely sure when the model runs out of memory. I am executing directly from python shell and not on Jupyter (which might give me more insights into the phase it runs out of memory). All I can say is that the model runs out of memory during the simulation phase when I call sim.predict_on_batch({nengo_probe_objs_lst[0]: batch[0]}) (I don’t get a log statement right after it).

With lower batch sizes, the code of course runs well for scale_firing_rates=1, but with higher GPU RAM consumption which my GPU can tolerate. I checked for multiple batch sizes and for each one, RAM consumption was same for scale_firing_rates = 100 and 600 and lesser than that for scale_firing_rates = 1.

The issue occurs during testing phase.

I did not experiment with such modifications. Actually I have a fixed architecture that I train in TF and then load the trained weights, followed by testing the model in Nengo-DL. Thus I don’t see an option to modify the architecture. As mentioned above, the model runs out of memory during testing phase in Nengo-DL.

I am using probes only for the output layer and above observations remain the same even if I disable the output layer probes.

I am yet to create a simplified model, but please let me know if any more information is required.

xchoo · August 25, 2020, 10:43pm

Hmmm. Your experiments do narrow down the potential location of the issue by quite a lot. It would seem to may be a weird interaction between the scale_firing_rates parameter and TensorFlow.

Just to clarify a statement you made:

I checked for multiple batch sizes and for each one, RAM consumption was same for scale_firing_rates = 100 and 600 and lesser than that for scale_firing_rates = 1.

Do you mean to say that with scale_firing_rates=100 or 600 the GPU ram usage is more than for scale_firing_rates=1, or the reverse? Your comment can be taken to be either case, depending on what “that” refers to:
“(lesser than that) for scale_firing_rates=1” or
"lesser than (that for scaling_firing_rates=1)

Additionally, do you observe a relationship between the batch size, scale_firing_rates value, and the GPU ram usage (actual numbers will help here).

zerone · August 26, 2020, 1:38am

Hello @xchoo, I already had the numbers last time but couldn’t find a tabular way to present them here… so chose to be descriptive. But looks like copying excel cells do the job! Therefore here it is. BTW, sorry for the confusion there. In the sentence you quoted, what I meant was that RAM usage for scale_firing_rates=1 is higher than the RAM usage observed for scale_firing_rates= 100 or 600.

Batch Size	Scale Firing Rates	Final GPU RAM (MB)	Intermediate GPU RAM (MB)
8	1	5813	1137
8	100	5784	882
8	600	5784	882

12	1	5104	1652
12	100	4844	1394
12	600	4844	1394

24	1	Process Killed	2677
24	100	4846	1394 / 2418
24	600	4845	1394

Final GPU RAM corresponds to the RAM usage when prediction is complete on one batch (and usage remains same for the other subsequent batches). Therefore it seems that sim.predict_on_batch() uses this much amount of RAM.

Intermediate GPU RAM is the RAM usage observed right before calling sim.predict_on_batch(). Now, this is tricky as I am not able to confidently associate this RAM usage to sim.predict_on_batch() (moreover the time difference between the allocation of RAM and printing of logs is very minute). This Intermediate GPU RAM caught my attention because the increase in RAM halts for some particular time here before increasing further to Final GPU RAM, and as I said, I am not confident to which stage I can associate it to. May be this information would be helpful to you.

As you can see, Intermediate GPU RAM is always more for scale_firing_rates=1 and subsequently the Final GPU RAM is also higher for scale_firing_rates=1. For the second last row there are two amounts of Intermediate GPU RAM mentioned as this is what I observed in multiple runs, however, 1394 was the most common one.

xchoo · August 26, 2020, 1:59am

Hmm. That is indeed strange. Do you know what the lowest threshold for scale_firing_rates has to be for this to show up? Does changing the value to 2 stop this effect?

zerone · September 6, 2020, 5:17pm

Hello @xchoo and @arvoelke, sorry for a late response. I attempted to create a standalone script to reproduce the issue. Found that it is intermittent. In fact, in the standalone script, the GPU ran out of memory for scale_firing_rates greater than 1 too for some instances of execution. Therefore not sure what’s exactly happening here. Could it be because of the sporadic non-optimal allocation of GPU memory? For now I am halting any further experiments in this direction as the issue is intermittent. In case I find a case when it’s frequent I will post it here.

xchoo · September 9, 2020, 4:54am

I took a closer look at the code and we aren’t doing anything special with that scale_firing_rates parameter. It might be that this behaviour you are observing is down to the peculiarities of the TF backend implementation, which is then a TF issue, and not directly a NengoDL issue.

zerone · September 9, 2020, 3:47pm

Hello @xchoo, I guess so. Given that it is intermittent and TF is responsible for allocating memory on GPU, and I do see the irregular memory allocation… things are pointing to issues in TF implementation. Although I will keep a note of it for future reference. Thanks @xchoo and @arvoelke for looking into it! For now I am marking it solved.