Encoding an Image as a nengo.Ensemble

Hi,

I am trying to feed images to be represented by a nengo.Ensemble(). I have seen the encoding example here Encoding for image recognition — NengoExtras 0.5.1.dev0 docs, but this require a call to nengo.utils.ensemble.tuning_curves() with a Simulator object passed in. How can we do this in the nengoGUI, as I wish to see my network in action dynamically, but there is no handle on Simulator object in nengoGUI?

Hi @yedanqi,

In NengoGUI, there are a bunch of built in functions that give you a handle to the Nengo simulator object. While there are multiple functions (see this Github thread), the one that is most applicable to your question is on_step(sim). To use this feature, simply define an on_step function somewhere in your code (it doesn’t even need to be part of the Nengo model), like so:

import nengo

with nengo.Network() as model:
    inp = nengo.Node(0)
    ens = nengo.Ensemble(10, 1)
    nengo.Connection(inp, ens)

def on_step(sim):
    # This function will be run on every step of the Nengo simulation (in the GUI)
    print(sim.model.params[ens].encoders)

In that Github thread, there is also a link to some example code that seems to do exactly what you are attempting to do (i.e. to display the tuning curves changing as the simulation is running). The example code can be found here, and it include code that turns the tuning curves into plots that can be displayed in NengoGUI. In NengoGUI, you can create custom plots using Nengo nodes, and by assigning a string containing HTML code to that node’s _nengo_html_ attribute, like so.

As an aside, note that in the example you linked, the tuning curves for the neural ensembles are static (i.e., they are generated when the model is built and do not change over time).

Thanks @xchoo! I see that we do have a handle on the Simulator object through these built-in functions. These will probably help.

A follow-up question I have is that since I am trying to input images to the neural network, is using nengo.utils.ensemble.tuning_curves() the only way to encode pixels as spikes? I have seen papers that does this with poisson encoding, but I am not sure how this is achieved in Nengo.

So far I have tried to input it to a nengo_spa.Transcode() to turn each image into a Semantic Pointer, but I realised without a proper nengo_spa.vocabulary.Vocabulary defined I can’t really inject any semantics to each image.

The final application that I am trying to achieve is actually an association that maps images to Semantic Pointers representing “One”, “Two”, “Three”, etc. It’s like a classification task, but I want to explore the effectiveness of learning these associations with learning rules like PES and STDP instead of solving the decoders directly with NEF or translating a rate-based model trained with backpropagation into a spike-based model.

Definitely not. If you look at the example you linked, you’ll notice that the tuning curve encoding is only 1 of 4 separate encoding styles used in the example. When doing image processing, a typical step in the network (usually the first few layers of the network) involve some sort of encoding. In the example you linked, 4 different encoding styles are used (using the default tuning curves, using sparse default tuning curves, using gabor filters that cover the whole image, and using sparse gabor filters), but there are probably other encoding styles out there (as you mentioned, poisson encoding). However, knowing what the “best” encoding style for your application is a research problem, and definitely beyond my area of expertise.

I haven’t experimented with the poisson encoding myself, so I don’t have extensive knowledge of how to deploy it in a Nengo network. From the quick search I did, it looks to be another form of encoding, and if you want a quick implementation of it, your best bet is to implement the algorithm in a Nengo node (the node contains the code that converts the input image into a spike train), and then feed the spike train to the rest of the network.

I’m not sure what the exact algorithm is to produce the spike train from the poisson process, but Nengo does include a neuron “class” that can convert any rate neuron type into a poisson-like spiking neuron. Using this, you could probably define a custom neuron type (or use one of the existing neuron types, if one fits the algorithm) to take the input stimuli and turn it into a poisson spike train.

Yup! In terms of what you are trying to achieve, I think the semantic pointer “layer” should really be one of the last layers in your network. Given the variability of the input (per class), it’s highly unlikely that using the input as a vector to the Transcode module will work. Additionally, the Transcode module makes the assumption that the vectors on both sides are “proper” semantic pointers (i.e., they have an expected vector magnitude), and breaking this assumption can lead to unexpected behaviour from the Transcode output.

In the vision system of Spaun for example, the input image is “converted” into a semantic pointer as follows:

input image → 2-layer convolution network → associative memory → conceptual semantic pointer

The 2-layer convolution network is a standard MNIST convolution network (similar to this one) that was trained outside of Nengo. The convolution network itself had the structure:

input → layer 1 convolution → layer 2 convolution → 10-node output (one for each class)

In typical visual classification applications, the classified output of the output node is used to determine what class the input image belongs to. However, since Spaun does more with this information, rather than using the output of the classifier layer, the output of the 2nd convolution layer is treated as the “visual semantic pointer” (note that some scaling is applied to get this vector within the expected assumptions of “standard” semantic pointers). This visual semantic pointer is used as the input to the visual associative memory that maps each visual semantic pointer to the conceptual semantic pointer (i.e., the ones representing the concepts “ONE”, “TWO”, “THREE”, etc.). This mapping is similar to what the Transcode module would do (except that the Transcode module doesn’t have the cleanup memory functionality, it’s just a pure mathematical transformation).

I think it’s going to be fairly difficult to train the entire network all in one shot. It’ll probably be a good idea to break down the network into smaller parts or to set some fixed assumptions for parts of the network and only train smaller sections of it. For example, from the example you linked, if you were to use the assumption that the encoders were sparse gabor filters, the fact that the NEF solver is able to generate decoders for a single layer network should mean that the PES rule (or any of the other Nengo learning rules) should be able to learn an equivalent set of decoders to do the same classification task.

1 Like

Thanks @xchoo! This is very insightful. I think I might need some time to digest this, but there are still a few points that I would like to clarify:

For this part I wasn’t really asking about the different filters, but rather the different API available for me to convert MNIST image pixels into spikes. So far I have seen nengo.utils.ensemble.tuning_curves() and I have also tried to use nengo.Node(lambda t: X_train[t,:]) in this example that I made:
mnist-pes.py (2.3 KB). I hope that both are valid ways to convert images into nengo.Ensembles.

This is really interesting. Seems like I will really have to return to backpropagation in order to learn more expressive neural networks even if it is just to convert images to semantic pointers. I would like to clarify what is a “proper” semantic pointer, as I only understood semantic pointers as high dimensional vectors (possibly unit length) and approximately orthogonal to each other to represent different concepts?

I think you may be misunderstanding how the images are “converted” into firing rates in that example. First, I should note that there aren’t any spiking neurons used in that example. If you look at the ensemble parameters:

ens_params = dict(
    eval_points=X_train,
    neuron_type=nengo.LIFRate(),
    intercepts=nengo.dists.Choice([0.1]),
    max_rates=nengo.dists.Choice([100]),
)

You’ll see that the ensemble is using LIF rate neurons. As with other rate neurons, you can determine the firing rate of an LIF rate neuron given the input current to the neuron, and the neuron’s response curve (the curve that is derived from the neuron’s non-linear activation function, and that maps some input current to some output firing rate). In typical neural network architectures, the neuron’s input current is calculated by taking the input to the neuron and multiplying it with some input weight. In Nengo, these input weights are referred to as “encoders”, and these are things that are being changed in the example. So to summarize, to get a rate neuron’s firing rate you do:

input → multiply by input weights → response curve → output firing rate

This entire process is what the nengo.utils.ensemble.tuning_curve() function is doing.
The process by which an input signal is transformed into a spike train (using spiking neurons) is similar, but different:

input → multiply by input weights → neuron membrane dynamics → output spike train

Because the neuron membrane dynamics depend on the passage of time, Nengo doesn’t include a function that will compute the entire spike train for you given some input. Rather, getting the spike train is achieved by running the Nengo simulation (i.e., doing sim.run()). Thus, the spiking-neuron variant of the example you linked would be something like this:

# number of hidden units
# More means better performance but longer training time.
n_hid = 1000

ens_params = dict(
    eval_points=X_train,
    neuron_type=nengo.LIF(),
    intercepts=nengo.dists.Choice([0.1]),
    max_rates=nengo.dists.Choice([100]),
)

solver = nengo.solvers.LstsqL2(reg=0.01)

with nengo.Network(seed=3) as model:
    inp = nengo.Node(<function that generates the MNIST images as a vector>)
    a = nengo.Ensemble(n_hid, n_vis, **ens_params)
    v = nengo.Node(size_in=n_out)

    nengo.Connection(inp, a, synapse=None)  # Here, we set the synapse to None to prevent the input image from being filtered
    conn = nengo.Connection(
        a, v, eval_points=X_train, function=T_train, solver=solver
    )  # Here, we set the synapse to a default value to filter the outgoing spike train

    probe_out = nengo.Probe(v, synapse=0.01)
    # We have to create a probe to get at the output of `v`. We filter it some more to reduce the spikyness.

The one major difference between the rate and spike networks is that while the nengo.utils.ensemble.tuning_curves function is able to return the firing rates for all of the different input images in one single function call (because every operation is a linear one, i.e., just one big matrix multiply), in the spiking network, the inp node will have to iterate through each image in the image dataset and present it to the network for some amount of time. From your mnist-pes.py code, it looks like you have already implemented this.

To summarize, while in a rate network the nengo.utils.ensemble.tuning_curves function can be used to transform a set of input vectors into a corresponding set of neuron activities (firing rates), for a spiking network, the very act of feeding the input vector to an ensemble of spiking neurons does this for you (i.e., just providing the input to an ensemble transforms the input to spikes).

If you were to change the LIF neuron type to a neuron using some Poisson process to generate spikes, then, without changing anything else about the structure of the network, you’ll have achieved the Poisson spike encoding that you mentioned in your original post.

I feel like some more clarification is needed here as well. :smiley:
I did not mean to imply that backpropagation is the only way to construct expressive neural networks. Rather, for the Spaun model, the quickest way to build such a complex model was to use a combination of the back-prop-trained visual system with an associative memory to map it’s outputs to the conceptual semantic pointers. As I mentioned in my previous reply:

it is definitely possible that the PES learning rule (or some other learning rule) is able to learn these associations. Just that it may take longer (and some finessing) to get a desirable output. It is also possible that the entirety of the Spaun network could have been trained using some combination of pre-defined structured networks, unsupervised learning, and supervised learning, but that process would have been super complex and very time consuming.

That is correct. A semantic pointer is generally just a high dimensional vector. Ideally, semantic pointers representing different concepts are somewhat orthogonal to each other, and being in a high dimensional space helps this (the more dimensions, the likely this is the case). Since the inner product (dot product) is often used to compare two semantic pointers, one crucial property of semantic pointers is that the vector magnitude is as close to unit length as possible. However, if you are using another type of similarity measure (e.g., cosine similarity), then the vector magnitude constraint is not as important.

Lastly, while semantic pointers are generally high dimensional vectors, they can be generated by either using some heuristic as to how to pick the vector elements, or by combining other semantic pointers together using some mathematical operation (e.g., circular convolution, superposition, etc.).

@xchoo, many thanks for your patience in explaining all these concepts to me :slight_smile: . I am very new to this paradigm so I tend to have tons of questions :sweat_smile:.

Previously I tried only converting the LIFRate() neurons to LIF() neurons in Ensemble a and surprisingly nengo.utils.ensemble.tuning_curves(a, simulator, inputs=images) still works with even a slight improvement in classification accuracy. Is this because Ensemble a is still being treated as a rate neuron as opposed to a spike neuron.

Also, while trying to implement the inp = nengo.Node(<function that generates the MNIST images as a vector>), I started to wonder if images must always be flattened, since nengo.Ensemble seems to be always representing a vector instead of a high rank tensor like those in Tensorflow. This make me wonder how 2D convolution is actually done in nengo, since I see a nengo.Convolution API in nengo Core.

Yup, that is correct. :smiley:
When you use the tuning_curves function, it uses the rate-equivalent of the ensemble’s neurons to compute the neuron activations, so it should be no different from just using the rate neuron outright. The slight improvement you saw in accuracy could be due to the randomness in generating the ensemble parameters. It is possible that for that specific run, the neuron tuning curves were spread out in just a way to give a slight bump to the accuracy.

For networks where the neuron parameters are randomly generated, we typically collect accuracy data over a set of runs (100+ runs) to get an idea of what the average accuracy is, and what the distribution of the accuracies look like. Alternatively, we set a seed for the network to give us the same accuracy no matter how many times the network is run.

That’s right as well. In Nengo, the inputs to any ensemble is treated as a flattened vector, regardless of what the original shape of the input is. However, since matrix multiplications are composable (you can break them up into series of vector multiplications), multiplying a 2D image with a 2D matrix is identical to multiplying a flattened vector with an equivalent matrix. This is what the 2D convolution connection does, it allows you to treat the input as a 2D matrix but internally, it will construct the equivalent matrix to be used on a flattened input.

@xchoo, thank you so much for your explanations. I really gained a better understanding on what is happening theoretically with the examples. I will take some time to digest everything now, but if I have any other questions, I will either post here again or start another topic.

1 Like

Hi @xchoo, I did some work on MNIST learning using PES rule as also described in this thread. I am just like to ask your opinion on if what I am witnessing is the effects of catastrophic interference/forgetting?

I also made the example into a notebook for documentation of the experiment
catastrophic-forgetting.ipynb (20.5 KB)
It seems like the moment I inhibit the error neurons, the network stop learning and only predicts everything as the last seen sample’s labels.

Looking at your notebook, I’d say yes. What you are observing is catastrophic forgetting. I think the issue stems from either the learning rate of the PES rule and / or the length of time you present the stimulus for. The higher the learning rate, or the longer you present the stimulus / output pair, the more it “overwrites” what is has learned before.

If you look at this forum thread I posted some code which uses the PES learning rule to learn a linear matrix transformation. In that code, I’m presenting the input / output pair for only 0.2s, whereas in your code, you are doing it for 1s.

Another thing you might want to experiment with is the number of neurons in the pre ensemble. I use a rule-of-thumb of at least 50 \times D (where D is the input dimension) neurons. In the code I used for the matrix transformation, I used 50 \times D^{1.5} (for better performance).

Hi @xchoo, thanks for your insights and sorry for the delayed response. I haven’t had the time to test out what you have suggested, but I will try it out soon and post some results here.