Conv2D with 'same' padding with Nengo Loihi

Hi,

I currently work on building ResNet for Loihi by using NengoDL converter and NengoLoihi with reference to the tutorial of “Converting a Keras model to an SNN on Loihi (https://www.nengo.ai/nengo-loihi/examples/keras-to-loihi.html).”

The ResNet has the Conv2D with ‘same’ padding, but it seems the Conv2D with ‘same’ padding is not supported by NengoLoihi. When I attempted to run a part of ResNet, I got the following message:
“NotImplementedError: nengo-loihi only supports convolution with ‘valid’ padding.”

Has anyone implemented the ResNet on Loihi by using NengoDL converter and NengoLoihi? And, if the Conv2D with ‘same’ padding cannot be used, the outputs of the the Conv2D are shrunk. Are there solutions to avoid it?

Thank you.
Kamei

Hi Kamei,

No one has ever implemented ResNet on Loihi that I know of, either with the converter or otherwise. As you said, if you use “valid” padding your output will shrink after each layer, and ResNet has so many layers that by the end, the shrinkage will be too much. I’ve implemented “same” padding for nengo.Convolution in NengoLoihi in the same-padding branch.

Hi,

Thank you so much for your prompt reply and patch of ‘same’ padding!

To check execution in my environment, I applied the patch and executed the tutorial with ‘same’ padding instead of ‘valid’ padding.
I could execute the tutorial in my environment, but I found the small bug at the nengo_loihi/builder/connection.py:760-763.
There are assertion code at that lines, so I commented out the lines.

I was able to execute the tutorial with ‘same’ padding, so, next, I try to build ResNet by using Conv2D with ‘same’ padding.

And, thank you for the information about ResNet on Loihi.

Thank you.
Kamei

Hi,

I have built small ResNet for Loihi by using NengoDL converter and NengoLoihi. Thank you very much for implementing Conv2D with ‘same’ padding.

We have succeeded to run the small ResNet on Loihi with a few modifications, and found ‘timePerTimestep’ takes longer than your tutorial. How can I reduce it?
See below for the details:
(1) Conv2D transformation
When I run the small ResNet by using NengoLoihi, I faced the following error message:
“BuildError: Conv2D transforms not supported for off-chip to on-chip connections where ‘pre’ is not a Neurons object.”
To solve this error I added Conv2D layer with ‘on-chip=False’ next to Add layer. By this fix, I was able to run the small ResNet on Loihi.

(2) timePerTimestep evaluation
We executed the converted small ResNet using NxSDK and compared with your tutorial.
I found that the ‘timePerTimestep’ of the small ResNet reported by NxSDK’s probe was about 0.08 sec. Meanwhile, the ‘timePerTimestep’ of your tutorial (https://www.nengo.ai/nengo-loihi/examples/keras-to-loihi.html) is about 0.001 sec. I think that the time prolonging depends on the network of the small ResNet and the on-chip setting has relation.

Thank you
Kamei

Hi Kamei,

There are a lot of factors that affect the amount of time a simulation on Loihi takes.

One of them is IO. If you have a lot of spikes going to the chip, or a lot of things on the chip you need to probe (to get output), this can slow things down. One way to work around this is by setting the biases of neurons on the chip to fire at the correct rates for a given input image, rather than sending spikes from the host to the chip.

Another source of slowness that we’ve recently discovered relates to sending spikes between chips. For example, on a Nahuku32 board, it is significantly slower to send spikes from one chip to another, than to send spikes within a chip. This is an active area of research, and one that we don’t have good solutions for yet. I’ve recently been working on a new allocator that tries to put the model on Loihi in a way that minimizes communication between chips. You could try that out and see if it helps (let me know if you find any bugs, I literally wrote it yesterday). If you really want to minimize inter-chip communication, you could look at extending that allocator with a more optimal algorithm for partitioning the model across chips (nxmetis might be a useful package for helping with this).

It is also possible that your on_chip=False layer is responsible for some of this slowdown, as you suggest. If you’re using precompute=True on your Simulator (which I would highly recommend, as it should be faster), though, then everything happening on the host will execute first, and then Loihi will execute all timesteps together, with no involvement from the host (we “pre-compute” all the inputs to the network, i.e. your input spikes, by running that separately on the host, then run everything on Loihi with those pre-computed spikes). So if you’re using precompute=True, then any times per timestep reported by NxSDK won’t be affected by having that layer off-chip.

Let us know if you have any success with improving your time per timestep; as I said, getting things to execute as fast as possible on Loihi is still a very active area of research!

Thank you for the information about reducing timePerTimesteps.
We will try the method and inform you results.

Kamei