Loihi Energy Probes and Simulation Timeout

SumbaD · February 23, 2021, 7:36pm

I’m trying to design an SNN that contains a small CNN converted from Tensorflow, but my end goal is to simulate on Loihi and characterize the energy. As a step in this direction, I’ve been trying to run just a converted CNN on Loihi. I was able to run the simulation and pick scaled firing rates, synapses, and data probe filters that gave good enough results, but as soon as I add energy probes the simulation times out (regardless of simulated time, the timeout length, and even just a single presented input). I tried 1 second of simulation time with dt=0.001 on a loihi_2h partition with no success. My scaled firing rate is pretty high, but with lower scaling the SNN output is extremely distorted compared to the CNN. Is 1000 timesteps not reasonable with the energy probe? Or are other guidelines I’m not following that is slowing the simulation? Any info would be appreciated. (the simulation does not time out with the energy probe commented out)

Model: (reLu activation)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 64, 64, 3)]       0         
_________________________________________________________________
conv2d (Conv2D)              (None, 31, 31, 16)        432       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 15, 15, 8)         1152      
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 7, 7, 4)           288       
_________________________________________________________________
flatten (Flatten)            (None, 196)               0         
_________________________________________________________________
dense (Dense)                (None, 1)                 197       
=================================================================
Total params: 2,069
Trainable params: 2,069
Non-trainable params: 0
_________________________________________________________________

Conversion:

nengo_converter = nengo_dl.Converter(model,
                                     swap_activations={activation: nengo_loihi.neurons.LoihiSpikingRectifiedLinear()},
                                     scale_firing_rates=100, synapse=0.005)
snnNet = nengo_converter.net

inLayer = model.layers[0]
firstConv = model.layers[1]
outLayer = model.layers[-1]

with snnNet:
    outProbe = nengo.Probe(snnNet.all_ensembles[-1], synapse=nengo.Alpha(0.05))

    #set input to Simulator
    nengo_converter.inputs[inLayer].output = nengo.processes.PresentInput([subset[0]], 1)
    #input is first image (64,64,3) presented for 1 second to match sim duration
    nengo_loihi.add_params(snnNet)  # allow on_chip to be set
    snnNet.config[nengo_converter.layers[firstConv].ensemble].on_chip = False

environ["PARTITION"] = "loihi_2h"
 
run_time = 1
dt = 0.001

sim = nengo_loihi.Simulator(snnNet, dt=dt)
board = sim.sims["loihi"].nxsdk_board
probe_cond = PerformanceProbeCondition(
    tStart=1, tEnd=int(run_time / dt) * 10, bufferSize=1024 * 5, binSize=4)
e_probe = board.probe(ProbeParameter.ENERGY, probe_cond)

with sim:
    sim.run(run_time)

xchoo · February 23, 2021, 9:14pm

Hi @SumbaD

NengoLoihi’s energy probe API merely exposes the underlying NxSDK API for use. NengoLoihi itself doesn’t do anything special regarding energy probes, so simply turning enabling the energy probes shouldn’t cause the simulation to hang or time out.

There are some things you can look at to debug this problem. First, ensure that your simulation is actually using the Loihi board. Without looking at the output provided by the simulation though, it is hard to tell whether or not this is the case for your simulation (you should be running it with SLURM=1 if you are using the Intel INRC cloud). Below is example output of what you should see if you are running things with the Loihi board:

INFO:DRV:  SLURM is being run in background
INFO:DRV:  Connecting to 10.212.98.108:33527
INFO:DRV:      Host server up..............Done 0.37s
INFO:DRV:      Compiling Embedded snips....Done 0.25s
INFO:DRV:      Encoding axons/synapses.....Done 0.04s
INFO:HST:  Args chip=0 cpu=0 /homes/xchoo/git/nxsdk-0.9.8/nxsdk/driver/compilers/../../../temp/1614114608.43131/launcher_chip0_lmt0.bin --chips=1 --remote-relay=1
INFO:HST:  Lakemont_driver...
INFO:DRV:      Booting up..................Done 2.20s
INFO:DRV:      Encoding probes.............Done 0.40ms
INFO:DRV:      Transferring probes.........Done 4.59ms
INFO:DRV:      Configuring registers.......Done 0.01s
INFO:DRV:      Transferring spikes.........Done 1.59ms
INFO:DRV:      Executing...................Done 0.61s
INFO:DRV:      Processing timeseries.......Done 0.17s
INFO:DRV:  Executor: 1000 timesteps........Done 0.81s
INFO:HST:  chip=0 cpu=0 halted, status=0x0

Importantly, you should see the message SLURM is being run in background, as well as the info messages showing the Loihi board spooling up and your code being executed on the board.

Another debugging step is to try the nahuku32 partition. You’ll have to double check with Intel, but I’m not sure how many of the boards on the loihi or loihi_2h partitions support energy probes. As far as I know though, the nahuku32 partition contains boards that do support energy probes.

Just to address this question, 1s of simulation time (i.e., 1000 timesteps) is very doable with energy probes, so the simulation time is not an issue here. We’ve run simulations that are 10s to 100s of seconds long with the energy probes without encountering any issue. In fact, I do recommend that you make your simulation at least 15s long since the energy probe data often contain startup artifacts that you may want to process (you’ll have to make that call yourself when you look at your own energy probe data).

If you could post the console output of your script running on the INRC cloud, it would be helpful to debug the issue. Additionally, if you are willing to share your code, posting a minimal example of your code that replicates the issue would be helpful too.

SumbaD · February 24, 2021, 6:06am

I’m running a jupyter notebook on the INRC cloud which I’m launching with SLURM (following the example on the INRC website). The remainder of my code is just imports and loading the model from TF, but the output and error response is as follows:

INFO:DRV:  SLURM is being run in background
INFO:DRV:  Connecting to 10.212.98.108:33191
INFO:DRV:      Host server up..............Done 0.19s
INFO:DRV:      Encoding axons/synapses.....Done 0.24s
INFO:DRV:      Compiling Embedded snips....Done 0.36s
INFO:DRV:  SLURM is being run in background
INFO:DRV:  Connecting to 10.212.98.108:36617
INFO:DRV:      Host server up..............Done 0.15s
INFO:DRV:      Encoding axons/synapses.....Done 0.25s
INFO:DRV:      Compiling Embedded snips....Done 0.30s
INFO:DRV:      Encoding probes.............Done 1.72ms
INFO:HST:  Args chip=0 cpu=0 /homes/sumbad/nengo_venv/lib/python3.5/site-packages/nxsdk/driver/compilers/../../../temp/1614132815.6668978/launcher_chip0_lmt0.bin --chips=1 --remote-relay=1 
INFO:DRV:      Booting up..................Done 2.64s
INFO:HST:  Lakemont_driver...
INFO:DRV:      Transferring spikes.........Done 1.39s
INFO:DRV:      Transferring probes.........Done 6.10ms
INFO:DRV:      Configuring registers.......Done 0.05s
INFO:HST:  srun: Force Terminated job 1066710
INFO:HST:  srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
INFO:HST:  slurmstepd: error: *** STEP 1066710.0 ON ncl-ext-ghrd-04 CANCELLED AT 2021-02-24T02:33:35 DUE TO TIME LIMIT ***
INFO:DRV:      Executing...................Error 1189.28s
INFO:DRV:  Executor: 1000 timesteps........Error 1190.73s
INFO:HST:  srun: error: ncl-ext-ghrd-04: task 0: Terminated
---------------------------------------------------------------------------
_InactiveRpcError                         Traceback (most recent call last)
<ipython-input-13-85dbb53b9b0a> in <module>
     13 with sim:
---> 14     sim.run(run_time)

~/nengo_venv/lib/python3.5/site-packages/nengo_loihi/simulator.py in run(self, time_in_seconds)
    329             )
--> 330             self.run_steps(steps)
    331 

~/nengo_venv/lib/python3.5/site-packages/nengo_loihi/simulator.py in run_steps(self, steps)
    342 
--> 343         self._runner.run_steps(steps)
    344         self._n_steps += steps

~/nengo_venv/lib/python3.5/site-packages/nengo_loihi/simulator.py in loihi_precomputed_host_pre_only(self, steps)
505         self._host2chip(self.loihi)
--> 506         self.loihi.run_steps(steps, blocking=True)
    507         self.timers.stop("run")

~/nengo_venv/lib/python3.5/site-packages/nengo_loihi/hardware/interface.py in run_steps(self, steps, blocking)
    252         # start the board running the desired number of steps
--> 253         d_get(self.nxsdk_board, b"cnVu")(steps, **{d(b"YVN5bmM="): not blocking})
    254 

~/nengo_venv/lib/python3.5/site-packages/nxsdk/graph/nxboard.py in run(self, numSteps, aSync, maxTimeInterval, generateCfg, cfgPath, partition)
    261                         aSync=aSync,
--> 262                         traceDirectory=traceDirectory)
    263         else:

~/nengo_venv/lib/python3.5/site-packages/nxsdk/graph/nxboard.py in _run(self, numSteps, aSync, traceDirectory)
    232                     self, traceDirectory=traceDirectory)
--> 233             self.executor.start(numSteps, aSync)
    234 

~/nengo_venv/lib/python3.5/site-packages/nxsdk/driver/executor.py in start(self, numSteps, aSync)
     82         if not aSync:
---> 83             self.finish()
     84 

~/nengo_venv/lib/python3.5/site-packages/nxsdk/driver/executor.py in finish(self)
    119         if self._state is ExecutionState.RUNNING:
--> 120             self._wait()
    121             self._notifyListeners(ExecutionEventEnum.POST_EXECUTION)

~/nengo_venv/lib/python3.5/site-packages/nxsdk/driver/executor.py in _wait(self)
    126         with timedContextLogging("Executing", NxSDKLogger.NXDRIVER):
--> 127             self._executor_service.waitExecution(empty)
    128 

~/nengo_venv/lib/python3.5/site-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression)
    922                                       wait_for_ready, compression)
--> 923         return _end_unary_response_blocking(state, call, False, None)
    924 

~/nengo_venv/lib/python3.5/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
    825     else:
--> 826         raise _InactiveRpcError(state)
    827 

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Socket closed"
debug_error_string = "{"created":"@1614134015.573146850","description":"Error received from peer ipv4:10.212.98.108:36617","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"Socket closed","grpc_status":14}"
>

During handling of the above exception, another exception occurred:

_InactiveRpcError                         Traceback (most recent call last)
<ipython-input-13-85dbb53b9b0a> in <module>
     12 
     13 with sim:
---> 14     sim.run(run_time)

~/nengo_venv/lib/python3.5/site-packages/nengo_loihi/simulator.py in __exit__(self, exc_type, exc_value, traceback)
    215     def __exit__(self, exc_type, exc_value, traceback):
    216         for sim in self.sims.values():
--> 217             sim.__exit__(exc_type, exc_value, traceback)
    218         self.close()
    219 

~/nengo_venv/lib/python3.5/site-packages/nengo_loihi/hardware/interface.py in __exit__(self, exc_type, exc_value, traceback)
    127 
    128     def __exit__(self, exc_type, exc_value, traceback):
--> 129         self.close()
    130 
    131     @classmethod

~/nengo_venv/lib/python3.5/site-packages/nengo_loihi/hardware/interface.py in close(self)
    159 
    160         if self.nxsdk_board is not None:
--> 161             d_func(self.nxsdk_board, b"ZGlzY29ubmVjdA==")
    162             self.nxsdk_board = None
    163 

~/nengo_venv/lib/python3.5/site-packages/nengo_loihi/nxsdk_obfuscation.py in d_func(obj, kwargs, *attrs)
     75         kwargs = {deobfuscate(k): v for k, v in kwargs.items()}
     76     func = d_get(obj, *attrs)
---> 77     return func(**kwargs)

~/nengo_venv/lib/python3.5/site-packages/nxsdk/graph/nxboard.py in disconnect(self)
    319         """
    320         BasicSpikeGenerator.isSpikeGenProcessConfigured = False
--> 321         self.executor.stop()
    322         self._executor = None
    323 

~/nengo_venv/lib/python3.5/site-packages/nxsdk/driver/executor.py in stop(self, force)
     94                 _force.force = force
     95                 self._executor_service.stopExecution(_force)
---> 96             self._notifyListeners(ExecutionEventEnum.ON_STOP)
     97             self._host_coordinator.stop()
     98             self._state = ExecutionState.UNDEFINED

~/nengo_venv/lib/python3.5/site-packages/nxsdk/driver/executor.py in _notifyListeners(self, event)
    147                 listener.postExecution()
    148             elif event == ExecutionEventEnum.ON_STOP:
--> 149                 listener.onStop()
    150             else:
    151                 raise Exception("Invalid event {}".format(event))

~/nengo_venv/lib/python3.5/site-packages/nxsdk/driver/listeners/lakemont_orchestrator.py in onStop(self)
     39     def onStop(self) -> None:
     40         """Stops the lakemont driver"""
---> 41         self.stopLmtDriver(empty)

~/nengo_venv/lib/python3.5/site-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression)
    921         state, call, = self._blocking(request, timeout, metadata, credentials,
    922                                       wait_for_ready, compression)
--> 923         return _end_unary_response_blocking(state, call, False, None)
    924 
    925     def with_call(self,

~/nengo_venv/lib/python3.5/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
    824             return state.response
    825     else:
--> 826         raise _InactiveRpcError(state)
    827 
    828 

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"@1614134015.577352099","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":5390,"referenced_errors":[{"created":"@1614134015.577347950","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":397,"grpc_status":14}]}"
    >

I tried using a “nahuku32” partition just now and received the same timeout error. I’m pretty sure I’m connecting properly because I was able to get output from a “loihi” partition without the energy probe. I’m willing to share more code since the CNN as it is now is nothing groundbreaking, but what would the recommended format be? Should I upload the full .ipynb or save just the code itself to a text file?

xchoo · February 24, 2021, 2:53pm

If you have a github repository link, that will work. Otherwise, attaching the Jupyter notebook to your post reply should work too.

Additionally, if you could provide information about the Python environment you were using to run your model (Python version, installed versions of NxSDK, NengoLoihi, etc.) that would be very helpful in the debugging process.

One quick thing you can try while you are putting together all of the documents:
Looking at the code you posted in your first post, I did notice that when you created the PerformanceProbeCondition object, you set tEnd=int(run_time / dt) * 10. If you set that to tEnd=int(run_time / dt) (without the 10x multiplier), does that fix the timeout issue?

SumbaD · April 1, 2021, 8:49pm

Apologies for the long delay, I wanted to organize my code better before posting but keep getting behind on other tasks so I decided to post it as is. The repository can be found here.

I’m not sure if I am misunderstanding the energy probe setup somehow or if there is some other simulation setting that is creating an issue here. Any insight would be appreciated.

The README page points out which jupyter notebook file I have been using to explore energy probes.

Thanks again for your help so far.

xchoo · April 4, 2021, 3:13am

Have you tried my suggestion?

SumbaD · April 6, 2021, 10:46pm

I did before but I just tried again while also reducing bufferSize to 1024 instead of 4*1024 and it completes without timeout now. Thanks for the help, some remaining related questions:

Why does that tEnd=int(run_time / dt) * 10 work in other examples? (for example here)
Why am I seeing no spikingPhaseEnergy but I am seeing managementPhaseEnergy (and host but that makes more sense to me)? Are the translated CNN layers running on embedded x86 somehow instead of the neurocores?
I thought probing the neuron output gives only spike (binary) values at each timestep, especially since on Loihi neurons only fire binary messages on a spike. Why does my probe of ‘output’ for the final ensemble show values greater than one, and even fractional values (ie 3.5, 2.3, etc)? Am I incorrect, can Loihi send out multiple spikes from a single neuron at a given timestep?
If I want to use the “decoded” output (to take advantage of smoothing using synaptic filters), is there a way to enforce a particular decoder weight? For example, depending on my nengo_dl.converter settings, the decoder is sometimes a negative value, and I’d prefer it to be positive or even just a weight of 1. I assume this is because NEF is multiplying the spike train by some number before applying the filter (since it assumes this neuron is part of a larger ensemble) but correct me if this is incorrect.

For context, the last layer contains only one neuron, which has been pre-trained to indicate whether a face is present at the input or not.

xchoo · April 7, 2021, 3:58pm

I suspect that for that example, the simulation run time was small enough so that the *10 multiplier didn’t make much of a difference. However, for your code, I think the *10 multiplier is long enough to cause the NxSDK code to do weird things. But that’s just a guess. I do know that there have been updates to NxSDK as well (since that code was written), so some thing might have changed to make the match between Nengo’s sim time and the NxSDK energy probe tEnd to be better.

If your network is the one you included in your first post, we can analyze it. Your network consists of 3 conv2d layers, 1 flatten later, and 1 dense layer (with 1 neuron). All of the convolution layers, and the flatten layer can be implemented directly in the connection weights (assuming your conv2d layer doesn’t use an activation function), which means they may contain no neurons at all. The only layer that contains neurons is the dense layer, and that only has 1 neuron, so the it probably doesn’t show up as using up a lot of energy.

It depends on what type of neuron you are using. If you are using nengo_loihi.neurons.SpikingRectifiedLinear, then it can spike more than once per timestep. Only the LIF neuron is limited to a maximum of 1 spike per timestep.

Yes, you can manually modify the decoder for any connection in the network. However, for your network, it would be good to understand why the decoded output is sometimes negative. If you could post your code, and an example of what you are trying to achieve, we can probably find a better solution to your issue.

SumbaD · April 7, 2021, 7:35pm

I used a ReLu activation function for all layers (including convolution layers) in the original network with the intention of using LoihiSpikingRectifiedLinear neurons in the converted SNN. I’m not sure I understand what you mean they contain no neurons; in the case of Loihi, are the embedded lakemonts handling that computation? Or is it handled by the neurocores yet somehow excluded from the spiking phase energy calculation?

Is this still the case for LoihiSpikingRectifiedLinear?

The code is available as a jupyter notebook at this github link where if you compare cells 11 and 14 you’ll see that the decoded output is negative in the cpu only simulation but positive for the Loihi version. These two simulations use the same exact network so I don’t fully understand the behavior of the decoder in this case. In these plots I’m comparing the probed output to the response of the CNN to the same input (a constant of about 3.3). It makes sense to me that the decoded output should be less than 3.3 because of the smoothing from the synaptic filter, but I don’t fully understand the negative sign.

xchoo · April 7, 2021, 9:54pm

Since I did not know the full structure of your network, I offered the explanation that a Conv2D layer without an activation would not create any ensembles when converted to a Nengo network. However, analyzing the model you linked on Github, I do see that this is not the case.

Both the LoihiSpikingRectifiedLinear and LoihiLIF neurons have their computation handled by the neurocores and so should be reflected in the spiking phase of the energy report. As far as I can tell, this issue is only present in NxSDK 0.9.8+. In my experimentation, NxSDK 0.9.5 and below correctly report energy usage in the spiking phase of the Loihi computation. Since NengoLoihi merely provides an interface to the NxSDK api to get the energy reports, you’ll have to send a message to Intel about this issue.

Note: I had to use an old Python environment (one that was created years ago) to test NxSDK 0.9.5. I tried creating a fresh environment to test it, but since NxSDK 0.9.5 requires Python 3.5.2 (which is now no longer supported), I found it difficult to get all of the packages to play nice together (TensorFlow and Jupyter notebook in particular were giving me a lot of issues with Python 3.5.2). I did make sure to upgrade the Nengo and NengoLoihi versions in the environment though, so I am confident that it is not an issue with the Nengo code.

Here’s an example output using NxSDK 0.9.5:

Note that since I don’t have access to your data, I ran the network with a random 64x64x3 numpy array.

Yes. That was a typo. It should be nengo_loihi.neurons.LoihiSpikingRectifiedLinear. My original comment was missing the Loihi in LoihiSpikingRectifiedLinear.

Ah. I see. I have a couple of comments, first, about the “decoder”, and second, about running a CPU-based sim of the Loihi network.

The “decoder”: Using the decoded output of an ensemble only really makes sense if you are using an ensemble in “NEF-mode”. That is to say, the Nengo has solved the decoders in the context of the encoders that have been generated for that ensemble. However, since you are using a NengoDL converted network, which creates connections directly to the ensemble’s neurons (i.e., bypassing the neuron’s encoders), the decoders for those neurons no longer make any sense to use.

To explain the negative decoded outputs, it’s likely that for that specific network, that neuron was initialized with a negative encoder (which is possible), which then requires a negative decoder to cancel out. When using the ensemble in NEF-mode, the two negatives cancel out, so you get the correct values. But, if you connect directly to the neurons of the ensemble, the encoders are ignored, so you end up with a negative decoded output.

For your use case, since you are using a NengoDL converted network, instead of the decoded ensemble output, what you’ll want to do is to probe the .neurons attribute of the ensemble:

outProbe = nengo.Probe(snnNet.all_ensembles[-1].neurons, synapse=nengo.Alpha(0.05))

Running a CPU-based sim of the Loihi network: As a note, if you use the target="sim" option when you create the NengoLoihi simulator, the simulation will run using your CPU but emulate what you get if you were to run it on the Loihi board (the results of the emulated run and the real Loihi run should be almost identical):

with nengo_loihi.Simulator(snnNet, dt=dt, target="sim") as cpuSim:
    cpuSim.run(run_time)