Model energy consumption

I am building a SNN model using Nengo to perform time-series forecast. The main reason is that I am interested in the energy efficiency of SNNs, while I aim to achieve a decent enough level of accuracy with my SNN model compared to a more traditional ANN for the very same task (which can be a 1D/2D CNN or LSTM).

My research doesn’t involve hardware, since it’s a bit out of scope given my current goal and study.

It would be very nice if I could prove that my SNN model build using NengoDL or Keras-Spking consumes (much) less than a similar implementation in terms of accuracy using a regular ANN. Otherwise, I’d be basing on the sole assumption that SNNs are more efficient since not every neuron is fired, so the energy consumed is by definition less or equal than a regular ANN with the same properties (number of neurons, etc). What do you think about this last statement? Is it enough or a more rigorous proof should be brought?

In regards to proving this in a more empirical way, I am very interested in what described in this Example: Estimating model energy. My only concern is this example focuses mostly on spiking-hardware-based implementation. Do you think it could be relevant to measure the energy consumption of two models (SNN and non-SNN) with this approach without considering any spiking hardware?

Excellent questions @dadadima!

Generally in research papers (even top-notch), I have seen researchers only presenting their spiking solution and showing that it matches the performance (e.g. in terms of accuracy) of the traditional non-spiking ANNs. They don’t do energy profiling and simply piggyback on the fact that you stated i.e. their spiking network can run on neuromorphic hardware, thus consuming less energy.

If I understand your question correctly, you wish to compare the energy consumption of spiking and non-spiking network by assuming that they both run on CPUs/GPUs. If so, it wouldn’t be relevant. The reason is that the non-spiking hardware: CPUs/GPUs are based on von-Neumann architecture, whereas the spiking hardware is based on non-vonNeumann architecture. This implies that CPUs/GPUs has their processing unit separated from the memory, whereas Neuromorphic hardware use memristors where processing unit and memory are both integrated in one unit (just like our biological synapses).

SNNs when implemented on Neuromorphic hardware, leverage this non-vonNeumann architecture to their advantage of lower energy consumption. If they are implemented on von-Neumann architectures i.e. CPUs/GPUs, I don’t think you will find any relevant difference with traditional ANNs in terms of energy consumption (please note that this is my speculation, I haven’t tested it yet). Moreover, if your application can go well with batch inference (e.g. image classification/retrieval etc.), then in real time… GPUs may be at advantage than Neuromorphic hardware due to more samples processed per batch! Generally in cases of online inference (i.e. batch size = 1, e.g. realtime driver intention recognition) Neuromorphic hardware gives you the advantage.

In a nutshell, you too can piggyback on the “Neuromorphic hardware consuming less energy” fact, which I might also do in my paper :sweat_smile: .

One big advantage of spiking networks is sparsity. If you’re processing e.g. a video, rather than all neurons having to transmit a value at each timestep, neurons can be sparse in both space (only some neurons in a layer transmitting a value on a given timestep) and time (each neuron only transmits a value on some timesteps). Traditional hardware can take advantage of this sparsity, too, it’s not restricted to neuromorphic hardware.

What makes things trickier is that traditional hardware is optimized for parallelism (particularly GPUs, but even CPUs to some degree). So memory accesses are more efficient when you’re trying to get a bunch of values stored in a row, and it’s more efficient to take the same instruction and perform it on multiple values (SIMD). When you want to use sparsity in your implementation, you’re typically getting values from all over the place and processing them differently, so it’s harder to take advantage of these parallelism advantages built in to the hardware. So even though you might be doing fewer computations in the sparse (spiking) network, it might take more energy to run it sparsely. You often need high levels of sparsity to get energy advantages on traditional hardware.

So even if you can show that your spiking network on e.g. a GPU uses a quarter of the number of computations as a non-spiking network, that doesn’t mean that you’ll see a significant reduction in power when actually running on a GPU with sparse computations.

Spiking neuromorphic hardware, on the other hand, is typically optimized for this sparsity, and can thus take much better advantage of it, with things like memory near the compute as @zerone mentioned, as well as e.g. communication optimized for spikes.

1 Like