Power Benchmarking with Model Replication

keeperbitz · September 25, 2020, 4:31pm

For a relatively small spiking neural network on Loihi (~1 core), it is recommended to replicate the model across cores and chips to be able to discern dynamic power from static power. I was wondering if there was a recommended way to do this across both cores and chips with Nengo? I notice that something similar can be done with cores in the https://www.nengo.ai/nengo-loihi/examples/mnist-convnet.html tutorial. Would it be possible to have this done across chips?

Eric · September 25, 2020, 5:47pm

Yes, you can allocate across multiple chips. Just use one of the available Allocators (which are unfortunately not appearing in the documentation, so I’ll include some code links).

The RoundRobin allocator maps cores to chips in a round-robin manner; so the first core goes on the first chip, the second core on the second chip, etc., until one core has gone on all chips, then the next core will be added to the first chip. This ensures an even distribution of cores across all chips, but means that cores near each other in the core list (which are often doing similar things, and communicating with similar sets of cores) end up on different chips; this leads to more communication between chips, which is slower.

from nengo_loihi.hardware.allocators import RoundRobin
with nengo_loihi.Simulator(
    network, n_chips=2, hw_opts=dict(allocator=RoundRobin())
) as sim:
    sim.run(...)  # do whatever you want to run here

The n_chips argument allows you to control how many chips to use. Just make sure this is less than or equal to the total number of chips on your board (e.g. a Nahuku8 has 8 chips).

The Greedy allocator fills up the first chip first, before moving on to the next chip. This is typically faster than RoundRobin, since nearby cores are put on the same chip, reducing communication between chips.

from nengo_loihi.hardware.allocators import Greedy
with nengo_loihi.Simulator(
    network, n_cores=2, hw_opts=dict(allocator=Greedy(cores_per_chip=64))
) as sim:
    sim.run(...)  # do whatever you want to run here

The cores_per_chip is an optional argument that allows you to limit how many cores get put on each chip. So if you know you’ve got 100 cores in your model, you could set cores_per_chip=60 to get 60 cores on the first chip and 40 cores on the second chip. By default, cores_per_chip=128, which is the maximum number of cores on one chip.