Long execution time and timeout with nengo-loihi

First a Little Background
My network that I want to run on nengo loihi uses 2 chips and has a large input size of (128, 128, 3). I am trying to execute this network with nengo-loihi for 400 timesteps (.4 seconds). One unique design feature is the we place a SpikingRectifiedLine after the input which runs off chip and scale our input images to match a frequency target.

When we run this on loihi we receive timeouts though due to extremely long execution time. Does anyone have any idea of why this execution delay is arising and or have any mitigation solutions for it?

When you’ve created your simulator but before you run, print out sim.precompute, for example:

with nengo_loihi.Simulator(network) as sim:
    print(f"Precompute = {sim.precompute}")

This will indicate whether the on-chip network can be run separately from the off-chip network (precompute=True), or whether the on-chip network needs to be run in lock-step with the off-chip network (precompute=False). Generally, if you’re able to run with precompute=True, your network will run faster and you’ll be less likely to time out. However, only some networks are capable of being run in this way. Specifically, if your network has any sort of recurrence where an output from the chip is passed to the host (i.e. a Node or off-chip Ensemble) which then has output that goes back to the chip, this is not compatible with precompute=True.

Can you also give more detail (e.g. a printout) of what the timeout looks like? Where is this occurring (is it at the start of execution, in the middle of execution, or during NxSDK setup)?

Have you been able to run a smaller-scale version of the network on Loihi (e.g. using 16x16 images)? If so, I would also try splitting such a network across two chips (e.g. using the RoundRobin allocator). There are certain types of population axons that have had trouble going between chips on certain NxSDK versions.

To that point, what NxSDK version and NengoLoihi version are you using?

1 Like

First off, I want thank you for your prompt response.

  1. Our network does not have any recurrence like and so I set Simulator(…, precompute=True) and saw that it was able to be set. However it still timed out

  2. The model just uses under 256 cores and I tried using the RoundRobin Allocator with my current model. I will try a smaller architecture and get back to you on this front.

Here is a printout of the timeout error:

INFO:DRV: SLURM is being run in background
INFO:DRV: Connecting to
INFO:DRV: Host server up…Done 0.24s
INFO:DRV: Encoding axons/synapses…Done 39.34s
INFO:DRV: Compiling Embedded snips…Done 0.67s
INFO:DRV: Compiling MPDS Registers…Done 0.29ms
INFO:HST: Args chip=0 cpu=0 /homes/mjurado3/miniconda3/envs/loihi_vishal/lib/python3.9/site-packages/nxsdk/driver/compilers/…/…/…/temp/1638411285.3001146/launcher_chip0_lmt0.bin --chips=2 --remote-relay=1
INFO:HST: Args chip=1 cpu=0 /homes/mjurado3/miniconda3/envs/loihi_vishal/lib/python3.9/site-packages/nxsdk/driver/compilers/…/…/…/temp/1638411285.3001146/launcher_chip1_lmt0.bin --chips=2 --remote-relay=1
INFO:DRV: Booting up…Done 1.18s
INFO:DRV: Encoding probes…Done 0.87ms
running loihi sim
INFO:DRV: Transferring probes…Done 0.01s
INFO:DRV: Configuring registers…Done 6.13s
INFO:DRV: Transferring spikes…Done 95.33s
INFO:HST: srun: Force Terminated job 1320729
INFO:HST: srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
INFO:HST: slurmstepd: error: *** STEP 1320729.0 ON ncl-ext-ghrd-04 CANCELLED AT 2021-12-01T20:14:05 DUE TO TIME LIMIT ***
INFO:HST: srun: error: ncl-ext-ghrd-04: task 0: Terminated
INFO:DRV: Executing…Error 6953.38s
INFO:DRV: Executor: 400 timesteps…Error 7054.86s
Traceback (most recent call last):
File “/homes/mjurado3/workspace/neuromorphics/loihi_conversion.py”, line 72, in test_predict_timesteps
loihi_sim.run(batch_size * pres_time)
File “/homes/mjurado3/nengo-loihi/nengo_loihi/simulator.py”, line 330, in run
File “/homes/mjurado3/nengo-loihi/nengo_loihi/simulator.py”, line 343, in run_steps
File “/homes/mjurado3/nengo-loihi/nengo_loihi/simulator.py”, line 497, in loihi_precomputed_host_pre_and_host
self.loihi.run_steps(steps, blocking=True)
File “/homes/mjurado3/nengo-loihi/nengo_loihi/hardware/interface.py”, line 253, in run_steps
d_get(self.nxsdk_board, b"cnVu")(steps, **{d(b"YVN5bmM="): not blocking})
File “/homes/mjurado3/miniconda3/envs/loihi_vishal/lib/python3.9/site-packages/nxsdk/arch/base/nxboard.py”, line 283, in run
File “/homes/mjurado3/miniconda3/envs/loihi_vishal/lib/python3.9/site-packages/nxsdk/arch/base/nxboard.py”, line 257, in _run
self.executor.start(numSteps, aSync)
File “/homes/mjurado3/miniconda3/envs/loihi_vishal/lib/python3.9/site-packages/nxsdk/driver/executor.py”, line 84, in start
File “/homes/mjurado3/miniconda3/envs/loihi_vishal/lib/python3.9/site-packages/nxsdk/driver/executor.py”, line 121, in finish
File “/homes/mjurado3/miniconda3/envs/loihi_vishal/lib/python3.9/site-packages/nxsdk/driver/executor.py”, line 128, in _wait
File “/homes/mjurado3/miniconda3/envs/loihi_vishal/lib/python3.9/site-packages/grpc/_channel.py”, line 946, in call
return _end_unary_response_blocking(state, call, False, None)
File “/homes/mjurado3/miniconda3/envs/loihi_vishal/lib/python3.9/site-packages/grpc/_channel.py”, line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = “Socket closed”
debug_error_string = “{“created”:”@1638418445.822006232",“description”:“Error received from peer ipv4:”,“file”:“src/core/lib/surface/call.cc”,“file_line”:1069,“grpc_message”:“Socket closed”,“grpc_status”:14}"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/homes/mjurado3/workspace/neuromorphics/loihi_conversion.py”, line 256, in
scores, layer_output = test_predict_timesteps(net, pres_time, nengo_outputs, test_images, test_labels, batch_size=batch_size)
File “/homes/mjurado3/workspace/neuromorphics/loihi_conversion.py”, line 93, in test_predict_timesteps
File “/homes/mjurado3/nengo-loihi/nengo_loihi/simulator.py”, line 217, in exit
sim.exit(exc_type, exc_value, traceback)
File “/homes/mjurado3/nengo-loihi/nengo_loihi/hardware/interface.py”, line 129, in exit
File “/homes/mjurado3/nengo-loihi/nengo_loihi/hardware/interface.py”, line 161, in close
d_func(self.nxsdk_board, b"ZGlzY29ubmVjdA==")
File “/homes/mjurado3/nengo-loihi/nengo_loihi/nxsdk_obfuscation.py”, line 77, in d_func
return func(**kwargs)
File “/homes/mjurado3/miniconda3/envs/loihi_vishal/lib/python3.9/site-packages/nxsdk/arch/base/nxboard.py”, line 345, in disconnect
File “/homes/mjurado3/miniconda3/envs/loihi_vishal/lib/python3.9/site-packages/nxsdk/driver/executor.py”, line 97, in stop
File “/homes/mjurado3/miniconda3/envs/loihi_vishal/lib/python3.9/site-packages/nxsdk/driver/executor.py”, line 159, in _notifyListeners
File “/homes/mjurado3/miniconda3/envs/loihi_vishal/lib/python3.9/site-packages/nxsdk/driver/listeners/lakemont_orchestrator.py”, line 41, in onStop
File “/homes/mjurado3/miniconda3/envs/loihi_vishal/lib/python3.9/site-packages/grpc/_channel.py”, line 946, in call
return _end_unary_response_blocking(state, call, False, None)
File “/homes/mjurado3/miniconda3/envs/loihi_vishal/lib/python3.9/site-packages/grpc/_channel.py”, line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = “failed to connect to all addresses”
debug_error_string = “{“created”:”@1638418445.846656905",“description”:“Failed to pick subchannel”,“file”:“src/core/ext/filters/client_channel/client_channel.cc”,“file_line”:3158,“referenced_errors”:[{“created”:"@1638418445.846655545",“description”:“failed to connect to all addresses”,“file”:“src/core/lib/transport/error_utils.cc”,“file_line”:147,“grpc_status”:14}]}"

Environment Information:

I would use the Greedy allocator rather than RoundRobin. Greedy doesn’t do anything to explicitly minimize inter-chip communication, but it typically ends up with lower inter-chip communication than RoundRobin.

We also have some new allocators that explicitly minimize the inter-chip communication: GreedyComms and PartitionComms. They’re in this PR. PartitionComms is the most effective, but does require that you install nxmetis. Often, it’s the inter-chip communication that makes networks slow, so reducing that could help considerably.

So I’m going to be attempting to get the PartitionComms allocator working with our network (I am a coworker of Michael, the original poster above). In order to do this, do I simply need to install the nxmetis package, splice in the either the GreedyComms and/or PartitionComms Class definition from the links you provided and call it in the net? I’m a little unsure on how we would go about porting this over for our purposes.

Appreciate any assistance in advance, and as Michael mentioned above, thanks for the consistent and quick responses.

Rather than manually copying the code, I would use that particular branch (the comms-allocator branch). The easiest way to do that is probably to clone the repository, checkout that branch, then do a developer install (e.g. pip install -e . when you’re in the folder with setup.py). Alternatively, we’re hoping to have that merged to master within the next day or two, so if you wait until then, you can skip dealing with the separate branch (though you will still have to install from source, since we won’t be doing a release probably until early January).

1 Like

When I try clone this branch and use it I get an error:

ModuleNotFoundError: No module named ‘nxsdk.graph.nxboard’

Is there a specific version of the nxsdk that works with this branch?
Thanks for your time!

Nevermind, I had neglected to pull the latest changes

Long execution time and timeout with nengo-loihi - Backends / Loihi - Nengo forum

I am able to Use GreedyComms but when I use PartitionIntrachip I received this error:

Traceback (most recent call last):
  File "/homes/mjurado3/workspace/neuromorphics/loihi_conversion.py", line 263, in <module>
    scores, layer_output = test_predict_timesteps(net, pres_time, nengo_outputs, test_images, test_labels, batch_size=batch_size)
  File "/homes/mjurado3/workspace/neuromorphics/loihi_conversion.py", line 71, in test_predict_timesteps
    with nengo_loihi.Simulator(net, remove_passthrough=False, target="sim" if layer_output else "loihi", precompute=True, hardware_options=dict(n_chips = 32, allocator = PartitionIntrachip())) as loihi_sim:
  File "/homes/mjurado3/nengo-loihi/nengo_loihi/simulator.py", line 198, in __init__
    self.sims["loihi"] = HardwareInterface(
  File "/homes/mjurado3/nengo-loihi/nengo_loihi/hardware/interface.py", line 94, in __init__
    self.nxsdk_board = build_board(
  File "/homes/mjurado3/nengo-loihi/nengo_loihi/hardware/builder.py", line 32, in build_board
    nxsdk_board = NxsdkBoard(
  File "/homes/mjurado3/miniconda3/envs/loihi_vishal/lib/python3.9/site-packages/nxsdk/arch/n2a/n2board.py", line 60, in __init__
    super(N2Board, self).__init__(id,
  File "/homes/mjurado3/miniconda3/envs/loihi_vishal/lib/python3.9/site-packages/nxsdk/arch/base/nxboard.py", line 97, in __init__
    self.allocateCoresByChip(numChips, numCores, initNumSynapses)
  File "/homes/mjurado3/miniconda3/envs/loihi_vishal/lib/python3.9/site-packages/nxsdk/arch/base/nxboard.py", line 176, in allocateCoresByChip
    c = chip.allocateCores(numCoresPerChip[i], numSynPerCorePerChip[i])
  File "/homes/mjurado3/miniconda3/envs/loihi_vishal/lib/python3.9/site-packages/nxsdk/arch/base/nxchip.py", line 190, in allocateCores
    assert (isinstance(numCores, int) or isinstance(numCores, np.integer)
AssertionError: <numCores> must be a positive integer.

I printed out numCores and it was 0. This looks like a potential bug in the code.

Can you file a bug report on NengoLoihi Github with code to reproduce the error, so that we can look more into it?

I have looked into the issue more. It primarily occurs when you use PartitionInterchip allocator with n_chips greater than the number of chips needed. For example, if your neural network needs 76 cores and you provide an n_chips=3 this error manifests. Otherwise, it is fine. Do you still think it would be worth it to file a bug report?

Here is the bug report: PartitionInterchip error relating to '<numCores> must be a positive integer' · Issue #327 · nengo/nengo-loihi · GitHub

1 Like

Currently, I have narrowed down the problem to models that use more than one chip. When I use PartitionInterchip (introduced in the PartitionComms branch) with a model involving 2 chips, I either receive a “broken pipe error” (with a partition of nahuku32) or a “Received shutdown signal from chip” error (with loihi as a partition). The attachment below shows that a dummy model can run on one model just fine but when I allocate the model across two chips it fails.

shutdown_signal_from_chip.pdf (214.6 KB)

One design difference from what you suggested is that I set precompute = False. However, I get the same error as before in this case as shown here with precompute = True:
rcp_bug.pdf (101.6 KB)

Do you have any idea what could be causing these problems or if this is fixable?

Have you tried the “pohoiki” partition? We’ve had some tests pass on there that fail on other boards. Many of our tests used to run successfully on nahuku32, but a month or two ago something changed with those boards, and we’ve been having more problems with them.

1 Like

Not sure if this directly relates to this topic, but I also encountered timeouts when running on Loihi (INRC).
Changing the recv_timeout in hardware/interface.py, class HostSnip
from 0.01s to 1.0s solved the problem for me.
Details: When running the “MNIST” tutorial with precompute=false,
the execution on KB (on intel cloud) became very unreliable.
In many cases, the execution will not even finish and the python
script will hang.