[Nengo DL]: Understanding the internals of Nengo-DL ; Theory and Papers

zerone · September 7, 2020, 12:28am

Hello everyone!

I am trying to understand the theory behind Nengo-DL internals in detail. For now the focus is on obtaining a spiking network from a conventional analog model trained in TF (and not on training the model in Nengo-DL environment).

TLDR: Please cite the research papers or any public design documents of Nengo-DL on this thread.

Specifically I have the following questions.

1> One of the ways to create a spiking model is to train a conventional model with analog neurons and then replace only the analog neurons with spiking neurons, keeping the trained weights (and biases if present) the same in the spiking network. With a conventional neuron, the weighted sum of inputs is fed into the activation function and we obtain a continuous output. With respect to Nengo-DL, what computations take place with a spiking neuron? Following is my understanding:

a) Spikes are produced (in Poission distribution) from the input data to feed in the spiking input layer. The spikes are fed to the input layer for a certain n_steps time.

b) The spiking neurons in the input layer further compute their membrane potential based on the input spikes to them (or does Nengo-DL compute current from the spikes internally to modify the membrane potential?) and then their behaviour simply follows the membrane potential equation i.e. if their potential reaches threshold, they fire and their membrane potential is reset. As a result, over the interval of n_steps time the spiking neurons in the input layer keep on firing spikes.

c) The input to the intermediate layers (i.e. the layers lying next to input layer) are again spikes from the previous layers and above computation in (b) continues up until the output layer, during which the sparse spikes are smoothed with the given synapse so as to not let the effect of spike suddenly disappear. But what role do the learned connection weights between the layers play? Do they multiply to the magnitude of discrete spikes? That is: considering the magnitude of spikes 1, and an array of spikes produced in n_steps time [0, 0, 1, 0, 0, 1, 0, 0, 1, …], the array is multiplied by learned connection weights, say 10 for a connection. Thus input to the neuron on the other end of connection is [0, 0, 10, 0, 0, 10, 0, 0, 10, …]. Is it? I am probably wrong here…

d) In the output layer, the inputs are similarly the spikes and output are also the spikes, however they are counted in the time frame of n_steps and then the count from each individual neuron (each representing a class) is fed to the softmax layer (if available) for finding out the probability score of each class.

All the information flow from the input layer to the output layer happens in a time duration of n_steps millisecond. Please correct me wherever I am wrong in my above understanding.

2> I know that scale_firing_rates increases the firing rate of neurons but how does it theoretically work? Any formal equation?

Following are the two papers which I am aware of (haven’t gone through the first one yet):

and I am looking for other papers to get more insights. Thanks!

xchoo · September 9, 2020, 6:27pm

NengoDL does the same computation as the core Nengo code does. NengoDL just adds to this functionality by using the TF backend to run the simulations (allowing easy integration with GPUs), and by allowing you to insert your own TF code into the Nengo model.

I’m not sure if you are referencing a specific network or example, or asking about NengoDL in general. Generally, NengoDL is agnostic about how data is fed into any spiking neural ensemble. The input spike distribution does not have to be Poisson, and is more often generated by feeding in vector data into a spiking ensemble to generate a spiking output.

If your NengoDL model is using decoded connections, the current input to a neuron is computed using the input value (in vector space), the neuron encoders, the neuron gain, and the neuron bias (see the compute_response function here). If you are using neuron connections in your model, a similar computation is done, just without the encoder term (i.e, just the input value, any transformation weights, the neuron gain, and the neuron bias). Note that inputs to neurons are (by default) filtered by a post-synaptic synapse (filtering by the synapse generates the “input value”). The default for connections is an exponential synapse with a 5ms time constant.

To compute whether or not a neuron spikes, Nengo (and by extension NengoDL) uses whatever membrane potential equation is specified by the neuron type (see the run_neurons function from the link above).

You are, in a sense, both right and wrong at the same time (depends on how you want to look at it). Let me explain.

Any connection weight in a Nengo model (be it learned or not) function the same way as connection weights in “standard” neural networks do. That is to say, they apply a scaling factor to the activity output values of a neuron. As I mentioned above, for Nengo, inputs to neurons are a post-synaptic filtered spike train, and the spike train can be considered the activity output of the preceding neuron. The filtered spike train ($s(t)$) can be computed as a convolution between the spike train ($S(t)$) and a filter response function ($h(t)$) that represents the synaptic filter:

$s(t) = S(t) \ast h(t)$

If we take into account a weight (from the connection weight matrix), $w$ into the equation, what it does is scales the filtered spike train:

$s(t) = w \times (S(t) \ast h(t))$

However, since the convolution and multiplication are linear operations, their orders can be switched. So, the connection weight can be considered as scaling the filtered spike train (as above), scaling the spikes:

$s(t) = [w \times S(t)] \ast h(t)$

or even scaling the synaptic filter:

$s(t) = S(t) \ast [w \times h(t)]$

Mathematically, all of the these are identically. In Nengo, if you have a connection without a synaptic filter (i.e., synapse=None for nengo.Connection), the connection weight would scale the value of the spike train. Note that in Nengo, the value of each spike in a spike train is $1/dt$. This is to make the area of the spike 1, regardless of the value of $dt$ used for the simulation. The code below is a quick demonstration of the spike value scaling:

import nengo

with nengo.Network() as model:
    ens1 = nengo.Ensemble(1, 1, intercepts=[-1], max_rates=[200])
    out = nengo.Node(size_in=1)
    nengo.Connection(ens1.neurons, out, transform=0.5, synapse=None)

    p_out = nengo.Probe(out, synapse=None)

with nengo.Simulator(model) as sim:
    sim.run(0.1)

print(sim.data[p_out])

This really depends on your network architecture and experimental setup. A softmax layer is not always necessary for doing classification, and other activation functions, or classification structures (e.g., associative memories) can be used.

I’m not entirely sure what you are asking here, as I’m unsure what you are referring to when you reference n_steps. So, I’m going to answer this question with regards to the time it takes for information to propagate through a neural network

The time it takes for information to propagate from the input to the output of your network highly depends on the network architecture. Increasing the number of layers (for example), increases the amount of time it takes to information to make it through your model. A similar effect happens if you have recurrent connections within your system.

As an example, for an exponential synapse (an exponential synapse is effectively a low-pass filter), the synaptic time constant $\tau$ is the time the output value takes to reach about $2/3^{rd}$s of the input value. You can roughly double or triple the time constant (super rule-of-thumb-ish) to get the “full” propagation time, thus adding 1 additional layer of neurons in a feedforward network roughly increases the propagation time of the network by $\approx 3\tau$.

This question gets into the specifics of the NengoDL implementation, so I’ll let @drasmuss (the author of NengoDL) to take over here.

drasmuss · September 9, 2020, 7:46pm

scale_firing_rates works by applying a linear scale on the input/output of the nonlinearity. So if your normal neuron equation is y = f(x), applying scale_firing_rates=r changes that to y = f(x*r)/r. In terms of neural networks, you can think of this as scaling all the input weights to the neurons by r, and scaling all the output weights by 1/r.

zerone · September 12, 2020, 4:10am

Hello @xchoo, thank you for looking into this and your detailed response. Actually I have lots of questions on the inner details of Nengo-DL when it comes to converting a TF trained network to spiking network. I am afraid that I started on the wrong foot. Without bringing in the context of Nengo, can you please explain With respect to Nengo-DL, what computations take place with a spiking neuron? when converting a TF trained network. Since I am unaware of the details, I think I should restrict myself from asking further questions… lest it turns out to be meaningless or wrong and we miss the context.

For me Nengo-DL is a black box for now, where I input a TF trained network and it outputs a spiking equivalent of it. To make the context more clear I have a TF trained network similar to the example presented here and I employ pretty much the same method shown to obtain the spiking equivalent. From the second paper by Eric and Chris, and few others I learned that the analog neurons (i.e. the ones with activation function e.g. relu) are simply replaced by spiking neurons (e.g. LIF or SpikingRectifiedLinear - as in the tutorial) to obtain spiking network. I just don’t know what fundamental computations take place in the spiking network during inference from the input of images to output of labels.

What I have learned from your previous reply is:

1> The input to the spiking network need not be necessarily Poission distributed spikes, rather it can be simply presenting a pixel value for n_steps simulation time to the spiking neuron to produce spikes in the input layer. I guess… the same is happening in the linked example above.

2> With J = alpha_gain x input_vector + J_bias (i.e. without the encoder term), I believe this is what’s input to the spiking neuron when converting a TF trained network to spiking network (as TF model doesn’t have any encoded-decoded connections, rather direct neuron to neuron connections). input_vector is simply the pixel value repeated n_steps times. And yes, spiking is done based on the spiking neuron’s membrane potential equation.

3> The output of the spiking neuron (in input layer and layers next to it) is simply a train of spikes i.e. [0, 0, 1/dt, 0, 0, 1/dt, 0, 0, 1/dt… ] (which is also referred to as activity output or spike train) which is then smoothed with a filter to have a sort of continuous signal from the discrete spike train. While smoothing we scale either the filter or the spike train with the connection weights and then the filtered (or smoothed) spike train is input to the next layer. I guess, this filtered spike train is again considered as the input_vector in above equation for J in (2) to calculate the current fed to the next layer’s neurons, right?

4> In the linked example tutorial, there is no softmax output layer, rather a dense layer with no activation, so I guess the classification is done by simply calculating the max of logits in TF network. With respect to spiking network… since the spiking neurons output spike train i.e. the activity output, how is the spike train used to classify the inputs? What computations take place with output spike train?

5> As per the explanation for scale_firing_rates by @drasmuss, I take that overall… following equations hold:

Assuming $V(t) = f(J (t))$ where $V(t)$ is the membrane potential and $J(t)$ is input current, we have:

$J(t) = \alpha_{gain} \times x(t) + J_{bias}$

$S(t)$ = activity_output = [0, 0, 1/dt, 0, 0, 1/dt, 0, 0, 1/dt…] output from dynamics of $V(t)$

$s(t) = S(t) \ast [w \times h(t)]$ where $w$ is connection weights

$x_\text{next-layer}(t) = \text{scale-firing-rates} \times s(t)$ : This is effectively scaling the input weights $w$.

$J_\text{next-layer}(t) = \alpha_{gain} \times x_\text{next-layer}(t) + J_{bias}$

$S_\text{next-layer}(t)$ = activity_output = $\frac{1}{\text{scale-firing-rates}} \times [0, 0, 1/dt, 0, 0, 1/dt, 0, 0, 1/dt…]$

Please let me know your pointers and correct me if I am wrong anywhere. Thank you for your valuable time!

xchoo · September 14, 2020, 8:40pm

Yes. For connections that use the full connection weight matrix (i.e., no encoders and decoders), this is what is used to compute the input current to the next layer of neurons.

If you are not using decoded outputs on your last layer of neurons, typically, what is done is to then just filter the spiking output of the last layer (with a synaptic filter) before feeding it to the classifier. The default synaptic filter time constant for nengo.Connection is 5ms, but you can get a smoother signal if you use a higher value (although, using a larger time constant will make your output signal less responsive to changes in the spike train).

The equations you have listed are mostly correct. Looking at the codebase, it looks like scale_firing_rates is applied to both the gains and biases, so your equations should look like this:

$x_{next-layer}(t) = s(t)$
$J_{next-layer}(t) = \text{scale-firing-rates} \times (\alpha_{gain} \times x_{next-layer}(t) + J_{bias})$

The spiking output computation is correct.

zerone · September 14, 2020, 10:43pm

Hello @xchoo, thank you for the clarifications.

1> With respect to the classification at the output layer, I am still not very clear about the computations done with spikes i.e. [0, 0, 1/dt, 0, 0, 1/dt …] . In a few spiking network papers I read that the number of spikes (for each output neuron) at the output layer is calculated and then by taking an argmax of the number of spikes, the corresponding neuron’s index is denoted as the class. Can you please explain the calculations in detail about what happens from the moment when the last output layer (with no softmax classifier) produces spikes to the moment when classes are determined in Nengo-DL, especially w.r.t. the architecture in here?

2>

With respect to above quote, I believe you are mentioning it in the context of the presence of a classifier in the output layer e.g. the softmax classifier. Is it?

3> Can you please link a source explaining the membrane potential equation for nengo.SpikingRectifiedLinear neuron?

4> Overall… it will be very helpful if you could just point me to the collection of papers which explain the theory behind Nengo-DL internals that we just discussed. I intend to cite them as my sources.

5> And yes, one technical question… can we run Nengo-DL models on multiple GPUs in inference mode? From the API interface of nengo_dl.Simulator it seems that we can use only one GPU per invocation of the simulator? If it’s not possible then I believe I will have to spawn multiple independent invocations of the simulator by exploiting data parallelism on test data.

Please let me know.

xchoo · September 15, 2020, 1:25pm

The example you linked elaborates in a few steps how the output classes are determined from the spiking output, namely synaptic smoothing and firing rate (activity) scaling.

To quickly summarize the lengthy example, the output of that network is a dense layer with 10 units (neurons), one representing each class in the output. When you configure the network as a spiking network, the output of the network is also spiky. Now, you can take the approach done in other models, and compute the total number of spikes produced in a certain time window and do an argmax to compute the output class. This method also works with the NengoDL example, although another method is employed in the example to achieve the same goal.

Instead, in the example, the output spikes are smoothed into a less spiky signal by applying a synaptic filter. In addition, since increasing the number of spikes in a signal (that is filtered) makes the filtered output smoother, the scale_firing_rates parameter is used to achieve just that.

Yes, my comment was if you had a classifier on the output layer. However, as seen in the NengoDL example you linked, this is not necessary. It really depends on the specifics of the network or problem you are trying to implement.

Documentation for the different neuron types in Nengo can be found here. Documentation for the spiking rectified linear neuron is here, and that links to the source here.

At the moment, NengoDL does not support running models across multiple GPUs. We do have it planned for a future release though.

zerone · September 15, 2020, 4:04pm

Hello @xchoo, thanks for your response!

With respect to the following:

I believe this is being done internally when we mention the appropriate values of synapse and scale_firing_rates and no explicit code is required.

BTW, thank you for this discussion! It has enriched my understanding of Nengo-DL!

xchoo · September 15, 2020, 5:19pm

Yes. This is correct.