Nengo.spa Action mapping

mingchenliu · January 22, 2021, 2:35pm

Hello, how do you understand the choice of spa actions? such as:

 actions = spa.Actions(
        'dot(vision, WRITE) --> verb=vision',
        'dot(vision, ONE+TWO+THREE) --> noun=vision',
        '0.5*(dot(NONE-WRITE-ONE-TWO-THREE, vision) '
        '+ dot(phrase, WRITE*VERB)) '
        '--> motor=phrase*~NOUN',
    )

Can you explain the meaning of (for example) in spa.Actions( '0.5*(dot(NONE-WRITE-ONE-TWO-THREE, vision)+ dot(phrase, WRITE*VERB)) ')? This example is in nengocore 2.8.0 version. Many thanks!

xchoo · January 23, 2021, 4:12am

Using the SPA to implement control flow in your model is a little complex to describe in one forum post, but I’ll attempt to do so here. If you want to know more about the SPA architecture and how it is used, I recommend reading “How to build a Brain” written by @celiasmith .

A note before getting into the meat of this matter: the discussion that follows applies to the SPA library that comes included with Nengo. However, it has been deprecated, and will eventually be removed in favour of using NengoSPA.

For comparison, the spa.Actions code format from this example has been re-implemented in NengoSPA as this, and uses a completely different syntax (NengoSPA uses operations instead of strings to implement actions).

However, in both nengo.spa and NengoSPA, the SPA actions work in concert with the basal ganglia (BG) network and Thalamus network to give Nengo models a kind of “decision”-making mechanism. In order to understand what the actions aim to accomplish, we need to first understand what the BG and Thalamus network does.

The BG and Thalamus Network
With the BG network, you provide it a vector of inputs, and the output of the network is all ones (or close to 1), except for the element that corresponds to the highest value of the original vector input. Basically, it performs a max operation on the elements of the input vector. As an example, a input of:

[0.5, 0.7, 1]

would result in the BG output being:

[1, 1, 0]

Typically, the networks we design use positive numbers to indicate selections (i.e., you’d expect the “selected” output of the BG to be 1, instead of 0), and this is where the Thalamus network comes in. It inverts the BG output such that with a BG output of:

[1, 1, 0]

The Thalamus output would be:

[0, 0, 1]

The output signal of the Thalamus can then be used to drive other networks in the Nengo model.

The Semantic Pointer Architecture (SPA)
You can read more about the SPA in Chris’ “How to build a Brain” book, or, if you cannot get a copy of it, there is a chapter discussing the SPA in my PhD thesis. Briefly, the SPA is a architecture in which vectors (called semantic pointers) can be used to represent concepts (like “red”, “green” or “car”, “bus”). These semantic pointers can also be combined and manipulated to form new concepts (like “red car”, or “green bus”). The nengo.spa library that comes with Nengo (core) is the library that implements the SPA within the Nengo framework.

In nengo.spa, semantic pointers are represented as fully-uppercase strings (e.g., "RED", "GREEN", etc.,), while modules that you can connect to are lowercase strings (e.g., "vision", "phrase"). Operations that you can perform on the semantic pointers are symbols like *, ~ or dot().

What are spa.Actions?
Because the BG network works with scalar values, there needs to be a way to “convert” the semantic pointer representations in a Nengo network into the scalar values that the BG can work with. This is where the spa.Actions come in. Each action string consists of 2 parts, the conditional, which appears before the -->, and the effect, which appears after the -->. As an example, the action string "dot(vision, WRITE) --> verb=vision", indicates that the condition for this string is: dot(vision, WRITE) and the effect of this action is verb=vision.

The method by which the spa.Action string converts the semantic pointers into scalar values is through the dot() operator. The dot operator is the dot product (a.k.a. inner product), and is used to find the similarity between two vectors. As an example dot(ONE, TWO) computes the dot-product similarity between the semantic pointers ONE and TWO. With modules, the dot() operator computes the dot product between the value being represented by the module (i.e., what is being “stored” in the module when the simulation is running), and a semantic pointer. So, the string dot(vision, WRITE) computes the dot product between the current contents of the vision module and the WRITE semantic pointer.

Putting it all Together
Right, so how does everything work together? Let’s take the example you posted:

actions = spa.Actions(
    'dot(vision, WRITE) --> verb=vision',
    'dot(vision, ONE+TWO+THREE) --> noun=vision',
    '0.5*(dot(NONE-WRITE-ONE-TWO-THREE, vision) + dot(phrase, WRITE*VERB)) --> motor=phrase*~NOUN')

This spa.Actions group contains 3 action strings. Thus, when provided to the BG network, it will produce a 3-element scalar vector.

Now, let’s consider a specific scenario. Let’s assume that the output of vision is WRITE, what does the scalar vector to the BG look like?

The first action string is dot(vision, WRITE), and since vision is currently WRITE, this value is close to 1.
The second action string is dot(vision, ONE+TWO+THREE). Since ONE+TWO+THREE is not similar to WRITE, this value is about 0.
The third action string is 0.5*(dot(NONE-WRITE-ONE-TWO-THREE, vision) + dot(phrase, WRITE*VERB)). This action string condition consists of two parts. For the second part, I’m going to assume there is currently nothing in phrase so, dot(phrase, WRITE*VERB) is about 0. For the first part, dot(NONE-WRITE-ONE-TWO-THREE, vision) is equivalent to dot(NONE-WRITE-ONE-TWO-THREE, WRITE). If we compute all of the dot products separately, it’s

dot(NONE, WRITE) - dot(WRITE, WRITE) - dot(ONE, WRITE) - dot(TWO, WRITE) - dot(THREE, WRITE)

which is roughly equivalent to -1. So, putting the entire third action string condition together, it’s (0.5 * -1 + 0) = -0.5.

Thus, the scalar vector being provided to the BG is [1, 0, -0.5], which means that the BG-Thalamus network should choose the first action, resulting in the effect of verb=vision. This action stores the output of the vision module into the verb module.

An Entire Example
Now that we know how the spa.Actions work, let us run through an entire sequence of vision inputs to see how the network can perform the parsing task. Of note, one important connection you did not include in your post is this:

cortical_actions = spa.Actions(
    'phrase = verb*VERB + noun*NOUN')
model.cortical = spa.Cortical(cortical_actions)

What this does is to create a connection between the verb module, the noun module, and the phrase module such that whatever is in phrase is always verb*VERB (the output of verb combined with the semantic pointer VERB) plus noun*NOUN.

So, let us start the simulation. At the start, there is nothing stored in any of the modules:

verb = 0
noun = 0
phrase = verb*VERB + noun*NOUN = 0
motor = 0

Now, let’s present WRITE to the vision module. When this happens, the BG input becomes [1, 0, -0.5], and this results in verb=vision being executed. Thus, whatever is in vision gets stored in verb:

verb = WRITE
noun = 0
phrase = verb*VERB + noun*NOUN = WRITE*VERB + 0
motor = 0

Next, TWO is presented to the vision module. When this happens, the BG input becomes [0, 1, 0]. To understand why:

dot(vision, WRITE) = dot(TWO, WRITE) = 0
dot(vision, ONE+TWO+THREE) = dot(TWO, ONE) + dot(TWO, TWO) + dot(TWO, THREE) = 0 + 1 + 0 = 1
0.5*(dot(NONE-WRITE-ONE-TWO-THREE, vision) + dot(phrase, WRITE*VERB) = 0.5*(-1 + dot(WRITE*VERB, WRITE*VERB)) = 0.5 * (-1 + 1) = 0
This causes the noun=vision effect to occur, resulting in noun getting the value of vision:

verb = WRITE
noun = TWO
phrase = verb*VERB + noun*NOUN = WRITE*VERB + TWO*NOUN
motor = 0

Lastly, a NONE is presented to the vision module. When this happens, the BG input becomes [0, 0, 1]. This causes motor=phrase*~NOUN to happen:

verb = WRITE
noun = TWO
phrase = WRITE*VERB + TWO*NOUN
motor = phrase*~NOUN = (WRITE*VERB + TWO*NOUN)*(~NOUN) ~= TWO

And with that, the motor system now has a semantic pointer representing “TWO”, demonstrating that the system has the ability to remember what it is being told to do (WRITE), and what information it is supposed to do the action with (TWO).

A Note
As a side note, the example you included in your original post is a little dated. A more up-to-date example can be found here. In the updated example, multiple outputs are defined (hand, and speech), and better illustrate how not only can the system remember information (YES, NO, HI, BYE), it can also be directed to use either one of it’s output modalities.

I’d advise you play around with the system by setting your own sequence of inputs to vision and see how the model behaves!

mingchenliu · January 23, 2021, 8:00am

Wow, thank you very much for your explanation. It’s perfect!
Thank you so much!