[Nengo-DL]: Adding a dynamic custom layer between two Ensembles

Sorry for so many questions… I cannot promise if this is my last one… :sweat_smile:.

So I have a Nengo network of K neurons which requires N input values and outputs one value each timestep. The N inputs are supposed to be the outputs of N neurons. I want to put this network between two Ensembles, but it comes with a catch. Just like the MaxPool op, each of the N inputs to this network from the previous Ensemble should be grid shaped and the single outputs should also be arranged in a grid (such that when flattened, it can be inputted to the next Ensemble or can have direct connection to the next Ensemble neurons).

Therefore, taking the pictorial example above, the 4 x 4 square represents the Ensemble neurons arranged in grid, there will be 4 instances of my Nengo network (each instance having its own K neurons) and each of the N neurons from the coloured sub-grid will be connected to an instance, with the 4 outputs forming the grid which will be directly input to the next Ensemble of 4 neurons (one-to-one connection).

The desired architecture can be seen in the picture below.


All the blue coloured Nengo Nets are independently functional and receive input in grid format, and their output is directly connected to the next Ensemble. Of course each of the inputs to the blue Nengo Nets are neuron’s input and their (i.e. Nengo Nets) output is fed to the individual neurons in next 2x2 Ensemble.

Keeping in mind the number of filters in a Conv Ensemble (previous and next both), how do I code this architecture? The reason I want such an architecture is to allow the neurons in the Nengo Nets to maintain their (voltage) state in each timestep of the simulation.

Yes. This is expected (with the bug). This is because the connections are made by the post layer. So, when the model is processed, it goes something like this:

  1. Add input layer, no connections made
  2. Add conv0 layer. Connection from input layer made, synapse=None because input layer are not neurons.
  3. Add max_pool0 layer. Connection from conv0 layer made, synapse=None (bug!). If correctly coded, the synapse=0.005 instead of None.
  4. Add conv1 layer. Connection from max_pool0 layer made, synapse=None because preceding layer does not contain neurons (it’s a TensorNode)
  5. Repeat for remaining max_pool and conv layers.
  6. Add dense layer. This is combined with the flatten layer because the flatten layer is just a linear transform. Connection is made to conv2, and synapse=0.005 because conv2 contains neurons.

I haven’t implemented this myself, but I think the easiest way to do this is to use the nengo.networks.EnsembleArray. The EnsembleArray is essentially a bunch of ensembles that are grouped into a single network. The convenience of the EnsembleArray comes with working with the input and outputs. With the EnsembleArray, the entire network is treated like one big ensemble, so all you have to do is make one connection to it, and internally it will handle the projections from the input dimensions to the individual ensembles in the EnsembleArray.

In summary, create an EnsembleArray of (in the case of the picture you posted) 4 sub-ensembles. Each sub-ensemble should be set to a dimensionality (using ens_dimensions) of 4 since they will each be representing 4 values. Finally, connection from the previous_ens to the EnsembleArray and define the connection weight matrix (i.e. the transform value in the nengo.Connection) in such a way as to implement the mapping you want. The transform value should be a programmatically shuffled identity matrix (I’ll leave it up to you to figure out how to do this).

Hello @xchoo, thanks for the suggestion. Actually, I had in mind that EnsembleArray might be useful in my case. However, I was stuck at two issues,

(1): how to input from the previous Ensemble, such that the neuron mapping is maintained? → I guess, this can be done by transform parameter to map the values (I am yet to realize it programmatically).

(2): My individual Nets which receive input from the 4 neurons are not just an ensemble of K neurons, but a specially designed Nengo network of passthrough Nodes and Ensembles. I also want the dynamics of the neurons in this network to evolve with that of the main converted network (which I guess, EnsembleArray takes care of). Therefore I am stuck at how to repeat this specially designed network over the grid of neurons.

In a nutshell, is there a way in EnsembleArray such that I can mention an instance of this Net as a parameter and it gets repeated? If not, then am I left with the only option of individually connecting the neurons of the previous Ensemble to my Net in a loop? Please let me know.


EDIT:
I was looking for more options w.r.t. above and came across Designing networks tutorial. So looks like, I can design my Net in a function and call the function again and again to create new instances of it. Next, all is left is to appropriately connect the different instances of my Net to the corresponding neurons in the grid layout of pre_ensemble and post_ensemble. Now, these Ensembles receive values and output values in flattened vectors shape. So I guess, I need to do some simple maths to map a tuple of (row_index, col_index, channel_index) of the implicit grid layout to the corresponding vector index in the flattened vector for both the pre_ensemble and post_ensemble while inputting to and outputting from my Net. Before I implement it, I wanted to know if there’s a better way than this? Also, if it sounds relevant to you, can you please elaborate on the following

How should the transform look like for an input of say… (26, 26, 32) shaped matrix? I did not get the idea of shuffled identity matrix. If you can explain it, I should be able to figure out how to create it. Thanks!

I was thinking about it some more, and I think you can also do the connection using the numpy slicing operator.
As an example, consider your input is a 2D array like so:

 0  1  2  3 
 4  5  6  7
 8  9 10 11
12 13 14 15

If you want to do a max pooling operation with a 2x2 kernel, you’d want elements 0, 1, 4, and 5 to go to the first ensemble, then elements 2, 3, 6, and 7 to go to the second ensemble, and so forth. If the input to the ensemble array (or a custom network) is 16D, the numy slice would look like this:

0 1 4 5 2 3 6 7 8 9 12 13 10 11 14 15

You can use that slice when you make the nengo.Connection:

slice = [0, 1, 4, 5, 2, 3, 6, 7, 8, 9, 12, 13, 10, 11, 14, 15]
nengo.Connection(ens.neurons[slice], max_pool.input)

Here’s some code that will generate the slice for you (this is hardcoded for a 4x4 input with a 2x2 kernel):

    trfm = np.zeros(16, dtype=int)
    mat = np.arange(16).reshape((4, 4))
    n_kernels = int(16 / 4)
    for n in range(n_kernels):
        trfm[n * 4 : n * 4 + 4] = mat[
            n // 2 * 2 : n // 2 * 2 + 2, n % 2 * 2 : n % 2 * 2 + 2
        ].flatten()

It’s not the prettiest, and I think there are better ways of doing it, but I have tested it to work.

You can do this, yes. You can even make a network of networks. I.e., make a “main” network that serves as the interface to the rest of your model, and within that network, call the network creation function for the subnetworks.

1 Like

Thank you @xchoo for the slicing suggestion and directly connecting the sliced neurons to the max_pool.input object. This very much cleans the code.

However, one needs to have a max_pool Nengo object (which I believe should be a Network here) which I have little idea to create. I have seen examples of it, like in spa networks where you connect an Ensemble to spa.<network>.input, but how do I structure such a “main” network, such that the sliced inputs get distributed in groups of 4s to each sub network? An example to create such will help. Or w.r.t. the following:

is it straight ahead to simply call the subnetwork creation function multiple times in a for loop (within a “main” network) and connect it to the “main” network’s input - but how to connect each subnetwork such that each subnetwork accepts sliced inputs in groups of 4s?

You can use the same slicing trick to distribute an N-dimensional input into groups of 4. As an example:

with nengo.Network() as max_pool_main:
    # Note, you can use a function to define your function here as well, 
    # this is just an example
    max_pool_main.input = nengo.Node(size_in=16)
    max_pool_main.output = nengo.Node(size_in=4)
    
    for n in range(16 // 4):
        # Make individual subnets and connect to input
        subnet = create_subnet()
        # Connect sliced input to input of subnet
        nengo.Connection(max_pool_main.input[n * 4: n * 4 + 4], subnet)
        # Connect output of subnet to sliced output
        nengo.Connection(subnet, max_pool_main.output[n])

Note that in the example code above, it makes no assumptions about the order of each element of the input (and it doesn’t need to). All it is doing is “mapping” the groups of 4 dimensions of the input to the various copies of subnet. You can combine it with the shuffled slicing I suggested in my previous post to get the correct mapping for the max pooling operation, like so:

slice = [0, 1, 4, 5, 2, 3, 6, 7, 8, 9, 12, 13, 10, 11, 14, 15]
nengo.Connection(ens.neurons[slice], max_pool_main.input)

If you are looking for an example network to start, I’d suggest the nengo.networks.EnsembleArray network code. It basically have everything you want (just use the __init__ function, and you can ignore the rest really), all you have to do to customize it to your needs is to replace this line with code to create your subnet network.

Thank you very much @xchoo for giving an example of it as well as the reference to EnsembleArray code. I will first start with the basics to get it correct, and then look for automatic replacement of the layers by registering it to the Converter. I guess, EnsembleArray code would be useful then. Until then, can you please let me know your thoughts over these questions in post #7. I guess it was missed in the plethora of my questions.

Hi @Zerone
I didn’t realize that I had missed some of your questions:

Probing the output of the connection that connects to a TensorNode is the only way to get the input signal to the TensorNode. It will still work with the bug in the current version of NengoDL. The bug only affects the synaptic filtering on that connection, so you will get a spiky output regardless of what synapse is set in the converter.

For this bug, the synapse for connections to TensorNodes is always None. So, if you wanted to work with the spike output, you wouldn’t need to do anything.

Not quite. ndl_model.layers is a dictionary (i.e., a map) between the TensorFlow layer object and the corresponding Nengo ensemble.

The filtered spike train weighted by the input connection weight matrix yes.

Hello @xchoo, no problem and thanks for getting back!

Exactly my thoughts. This is what I wanted to confirm, I think I should have rephrased my question well.

With respect to the following,

I meant to ask… after the bug is fixed and I want to work with the spike output, then I would have to set the synapse=None explicitly in the Connection to TensorNode before simulating the network… right? And thanks for confirming the following!

BTW, when I implemented your suggestion with slight modifications to suit my needs, and simulated the network; it takes nearly:

|    #            Optimizing graph: creating signals                  | 0:24:37
|                          Optimizing graph                           | 0:39:17
Optimizing graph: creating signals finished in 0:24:37
|                          Optimizing graph                           | 0:39:17
Optimization finished in 0:39:17
|#                        Constructing graph                          | 0:00:00

Once again for reference, my create_subnet() returns a subnet composed of PassThrough nodes and Ensembles, and the neurons in the Ensemble are directly connected to the nodes (i.e. with nengo.Connection(ens.neurons[i], node, ...)), so I guess, there’s no computation of decoders happening, yet it takes significant amount of time to Optimize the network (even more than an hour in case of few wider nets). After optimization, the network functionally works as expected, but was wondering if you would have any suggestions to bypass/improve the optimization time? Please let me know!

You may want to try disabling the operator merging. This might improve the optimization time.

Thank you @xchoo for the suggestion. I am bit caught up with another work, but will try it soon and let you know.

Hello @xchoo, I have few questions below:

1> In a nengo.Connection(), what is the relative ordering of synapse, function, and transform? I know that function is applied before the transform, so is it synapsefunctiontransform, and in absence of function, is it synapsetransform? i.e. incoming spikes (if the pre_obj is an Ensemble) are synapsed/filtered first and then a Linear Transform is applied or is it just the opposite?

2> The default spike amplitude is 1/dt (due to the amplitude default set to 1). When I probe the Ensemble neurons of type nengo.LIF() or nengo.SpikingRectifiedLinear() or nengo_loihi.neurons.LoihiSpikingRectifiedLinear() (with default values of amplitude and dt), I see the spikes of amplitude 1000.0 (2000.0 when amplitude is set to 2). In NengoDL, the spikes are calculated as sim.data[neurons_probe] x scale_firing_rates x dt, which based on above (assuming amplitude=1, scale_firing_rate=100, dt=0.001) would evaluate to 1000.0 x 100 x 0.001 = 100, but based on my experience with NengoDL, for a scale_firing_rate of 100 (or 250), I see the spikes (calculated by the same previous formula) as 10 (or 4). Why is it not equal to 100 (or 250) (as should be obtained by the formula)? It seems like we should rather divide the amplitude with scale_firing_rates (as mentioned in your excellent description here). For some reason, I am messing things up. Please clarify? What computations do the spikes of amplitude 1/dt go through after being generated from the previous ensemble neurons in NengoDL?

3> This might sound trivial, but just confirming: SpikingRectifiedLinear() and LoihiSpikingRectifiedLinear() neurons are examples of “Integrate and Fire” neurons… right?

In the build model (i.e., the model that the simulator uses), both the function and transform are combined into the connection’s weight matrix. Recall that the function is not a physical entity. It is just used by the connection’s weight solver to solve for the appropriate connection weights that will approximate the function.

As for the synapse, it is applied after the connection weight has been applied. However, since both the connection weight and synapse application are linear transforms, mathematically, it doesn’t matter which order they are done.

Can you provide a minimal network that demonstrates this? Just so that I have context with what network you are building, the parameters you have used, and how you are testing it.

This is correct.

Hello @xchoo,

I suppose you are talking about the convolution operation (between the synapse and the weighted spikes) implemented as Linear Transforms, and hence the order doesn’t matter. Right? In traditional Linear Transform operations through matrix multiplication, the order of matrices matters… isn’t it?

With respect to the following:

I suppose you are hinting towards the mathematical functions whose Decoders can be calculated, thus incorporated in the connection weight matrix along with transform. Right?

About the following statement:

Let’s say a Convolutional transform has to be applied, then whatever is the output of the neurons of the previous Ensemble, NengoDL first applies the transform, and then applies the synapse on the resulted values. Is it?

Here’s the attached minimal script. As I said earlier, I was bit confused. Doing this exercise resolved my doubts. After setting scale_firing_rates = 100, I could see that the raw output of data1[conv0_lyr_otpt] (i.e. probed from Conv0 layer) is following (Cell 15),

[9.999999 0.       0.       0.       0.       0.       0.       0.
 0.       0.       0.       0.       0.       0.       0.       0.
 0.       0.       0.       0.       0.       0.       0.       0.
 0.       0.       0.       0.       0.       0.       0.       0.
 0.       0.       0.       0.       0.       0.       0.       0.      ]

however, upon multiplying data1[conv0_lyr_otpt] with scale_firing_rates and dt, I see the actual spike amplitudes which is 1, as can be seen below (Cell 17)
investigating_spike_amplitudes.ipynb (35.5 KB)
:

[0.99999994 0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.        ]

My doubt stands pretty much resolved, however, I would like to get some more clarifications. I see that the raw output is 10 (4), when scale_firing_rates is 100, (250) respectively, which is (amplitude/dt) / scale_firing_rate (assuming amplitude is 1 and dt is 0.001).

Is this the “input” to further computations in the connections, i.e. is function, or transform or synapse applied on (amplitude/dt) / scale_firing_rate (e.g. 10.0) in the nengo.Connection()? or are the above computations applied to the value of “spike amplitudes” e.g. 1 obtained after multiplying (amplitude/dt) / scale_firing_rate with scale_firing_rates and dt?

Yes. That is correct. The synaptic application is a convolution operation and it doesn’t matter if you apply the convolution first, or do the connection weight matrix multiplication first.

Correct!

Yes

Yes. This is expected behaviour. Remember that the purpose of the scale_firing_rates parameter is to increase the firing rates of the neurons in the network. However, in order to keep the overall “energy” (or information) being transmitted by the network the same in both cases, the amplitudes of the spikes are divided by the scale_firing_rates value. From my other post:

The scale_firing_rate parameter is applied directly to the amplitude of the neuron’s spikes. Similarly (on the input side), the scale_firing_rate parameter is applied directly to the neuron’s gains.

Just to clarify your earlier question. I believe you are conflating two different concepts here. Your original thought was that the scale_firing_rates would affect the amplitude by scaling it up. However, the purpose of the scale_firing_rates parameter is to increase (typically) the firing rates of the neurons (by adjusting the neuron gains). The reason the amplitude of the spikes is affected is because we want to keep the overall amount of information being transmitted by the spikes the same (pre and post-scaling). Since we scaled up the firing rate, the thing to do would be to scale down the amplitude, which in effect “cancels” out any additional information gain obtained by increasing the neuron gain.

Thanks @xchoo for your response. I am seeing that I am circling back to the age old questions which have already been resolved by you… probably I am looking it from different perspectives. I will try to be more careful.

With respect to the following,

I am understanding that input to the next Ensemble is (amplitude/dt) / scale_firing_rate (e.g. 10.0), however (as you mentioned), since gain already has scale_firing_rates multiplied to it, upon calculating the current $J = gain \times <e, x> + bias$, the scale_firing_rates automatically gets multiplied (by the virtue of the multiplication of gain) to the input $x$, thus effectively the current is calculated from the spikes values (i.e. $1$ or $2$… of course depending upon the neuron type). Right?

Another follow up question on

say if the function is non-trivial (i.e. not a max, exponential, sum, etc.) rather the function does some calculation in a loop, then will the decoders be calculated? I am having troubles in following… how could decoders be calculated (which could approximate that non-trivial function, e.g. sum of products, binary search!)? Is it the same process… that after executing the function on some random sample, the outputs are used to calculated decoders?

On another note, I have a network with Conv -> Conv -> AvgPool -> Conv -> Dense. When I attempt to deploy it on Loihi (the emulator in NengoLoihi), I get the following error: NotImplementedError: Mergeable transforms must be Dense; set remove_passthrough=False, which I believe is due to the presence of AvgPool layers which is modelled as the Passthrough node, and it’s AveragePooling transform is attempted to be merged with the Convolutional transform. And, this is not yet supported on Loihi… is it?

BTW, upon setting the remove_passthrough=False in the nengo_loihi.Simulator() args, it throws the following error: BuildError: Conv2D transforms not supported for off-chip to on-chip connections where pre is not a Neurons object..

Please let me know your suggestions!

That is correct. Note however that in my previous post explaining this, I specifically chose the ReLU neuron type, because the firing rate is linearly proportional to the input current (i.e., doubling the input current doubles the firing rate, which makes the explanation easier). If you use the LIF neuron however, things become more complicated as the neuron response is not linear, so the scale_firing_rates parameter doesn’t always work the way you think it does (with LIF neurons).

In such instances, Nengo will have a very hard time (if not impossible time) trying to solve for decoders. The decoders are solved by evaluating the desired output function for a given set of evaluation points. If the function requires some sort of recursion, then Nengo will solve the decoders for the first pass of the recursion (which is likely not what you want). In order to implement these more complex functions in Nengo, you’ll need to break down the function into simpler bits.

As an example, if you wanted to compute a sum of products, you’ll want a bunch of sub-networks to compute the products, and then another ensemble to compute the sum (note, this ensemble can be the ensemble where you want to use the sum, or it can also be a passthrough node).

I’ll have to run some test code to be sure but this is likely the case. In this instance, what you’ll probably want to do is to insert a Dense layer between the AvgPool and Conv layer in order to force an ensemble to be built there. That might work, but I can’t say for sure.

Yeah, NengoLoihi pretty much requires remove_passthrough to always be True. This is because we want to minimize the number of off-to-on-chip connections (and vice versa) since all nodes have to be simulated off-chip.

Try my suggestion of inserting a Dense layer first, and if that still doesn’t work, then I’ll have to ask the NengoLoihi devs on ideas on how to get around this issue. It’ll probably involve some math to calculate the full connection weight matrix for the Conv2D transform (and I don’t know that off the top of my head).

Hello @xchoo, thanks for your response and confirming the computation math with gain. With respect to Nengo dealing with complex functions, I was just curious about them, and your insights are very useful. Indeed, a possible way would be to break down those complex functions to give Nengo an easy time.

With respect to the AvgPooling on NengoLoihi, this is again something I was curious about. In future if I take this into priority I will remember adding a Dense layers between AveragePooling and Conv layers.

I was actually working with your suggestion to configure the planner to bypass graph optimization and speed up the execution, however, it doesn’t seem to help. Following is the code where ndl_model is obtained from the nengo_dl.Converter().

with ndl_model.net:
  # Disable operator merging to improve compilation time.
  nengo_dl.configure_settings(planner=noop_planner)
.
.

My network has the custom layer as you suggested here, and I even included nengo_dl.configure_settings(planner=noop_planner) within that subnet as well, but nothing helps. Any suggestions?

Without being able to fully profile your network, it’ll be hard to say exactly where the slowdown is happening. You could also try disabling the sorting algorithm to see if that helps? Apart from that, I don’t have much advice. I haven’t built complex models in NengoDL myself, so my experience there is limited. It may also be that you are at a network size where the time it takes to optimize the network is unavoidable.

Got your point @xchoo. Thanks for the another suggestion as well. I will try it out. However, yesterday I conducted few more experiments with disabling the graph optimizer and found that it takes even longer to optimize the network, possibly never ending… (even after 5 hours and 30 mins it was still in optimization phase). After enabling the graph optimizer, it still takes time, but probably reasonable and ends up executing fine. So I really don’t know what’s going on with graph optimization right now, but probably my network size is genuinely large enough.