Using NEF with 'exclusive OR' and some other questions

gidmeister · May 16, 2017, 10:32am

Background

I’ve been reading the “Technical Overview of the Neural Engineering Framework” by Terry Stewart, and it filled in some gaps in my understanding (I’m a newbie), but I want to make sure that I understand what he says there.

XOR and the NEF

In the paper, he mentions:

Importantly, this means that nonlinear functions can be computed with a single layer of connections –
no back-propagation of error is required. This includes not only the classic XOR problem, but also
more complex functions such as multiplication, trigonometric functions, or even circular convolution.

How do you solve the XOR problem with the NEF?

XOR has a truth table as follows:

(first 2 columns are inputs, last column is outputs).

To solve this, our group of 3 neurons could each have 2 incoming weights.
So they might have:

Neuron A: 0 1
Neuron B: 1 0
Neuron C: 1 1

So combined, this 3 member neural group give us a vector that represents the output.
For instance, the case where the input is 0, 1 it would produce a vector A=1, B=0, C=0
the case where the input is 1, 1 would produce a vector A=0 B=0 C=1

(Am I correct so far?)

The encoding equation is:

In the article, it says this: “if the e_i values are all aligned with the standard basis vectors, then nonlinear functions of multiple variables cannot be computed (this is the case for standard connectionist models, which is why they require multiple layers).”

I’m wondering what ‘standard basis vectors’ are here. Are they 0 1 and 1 0 (the x and y axis)? If so, is that third neuron of 1, 1 an extra basis that gives more ability to represent non-linear functions?

Now lets suppose we want to recreate the input that came into this neuron triplet, and pass it from the neuron triplet to another group of neurons, this group having 10 neurons in it.

Suppose none of them have an encoding vector of 1,1 (there is no equivalent to neuron C). But some may have encoding vectors such as (0.5, 0.7), (-0.3,0.2) etcetera

Would they be able to encode XOR?

The article also says:

The method of representation used in the NEF also allows you to add values by simply feeding two inputs into the same group of neurons. If you connect group A to group C with connection weights that compute f(a) and if you connect B to C with connection weights that compute g(b), then the neural group C will end up with the activity pattern that represents f(a)+g(b). This is a consequence of the linear representation given in Equation 1, and is vital for constructing larger networks.

I’m not sure I picture this exactly.
First, should A and B have the same number of elements?
Secondly, is it true that for every neuron in C, every neuron in A and B sends a connection to it?

Semantic Pointer Architecture

Finally the article talks about Semantic Pointers (SPA’s). It says that:

Furthermore, the vectors maintain similarity, so that if “pink” and “red” have similar vectors, then “pink square” and “red square” will also have similar vectors. This feature allows inductive reasoning over complex patterns. For example, our neural model of the Raven’s Progressive Matrix task (a standard intelligence test where participants are given 8 visual patterns in a 3x3 grid and are asked to determine what pattern should be placed in the missing square) works by forming the vector representation of each pattern and computing the average transformation that that will take one pattern to the next

This sounds fascinating. It seems that you are creating a space where objects that have similar semantics, stay together.

Would this imply so that red wood square box is more similar to red square than pink square? In other words, a hierarchy of related concepts would be close to each other?

Raven’s Progressive Matrices

The Raven’s Progressive Matrices leads to another question.
Suppose you could have 2 sequences that start the same way:
1, 11, 111, 1111
and
1, 11, 21, 31
The first sequence just concatenates 1’s to the end, the second sequence adds 10 by addition. It seems that by averaging the transformation between the consecutive patterns, you would get different predictions for the next object in the set.

If that is how it works, then it seems that you should be able to:

Predict time series
Predict analogies (‘a’ is to ‘b’ as ‘c’ is to ‘d’)

Has that been tried? Would it work?

Seanny123 · May 19, 2017, 5:07am

Welcome to the forum! Your question is admirably involved. I can see you’ve put a lot of thought into it! I edited to make it a bit clearer, but let me know if I lost some of the meaning in the process. I’ll do my best to answer everything I can!

Semantic Pointer Architecture

Would this imply so that red wood square box is more similar to red square than pink square? In other words, a hierarchy of related concepts would be close to each other?

Yes, it’s possible to build that relation into a vocabulary space using the Semantic Pointer Architecture. See this example (which is also in the Nengo GUI) for more details.

Raven’s Progressive Matrices

If that is how it works, then it seems that you should be able to:

Predict time series

Predict analogies (‘a’ is to ‘b’ as ‘c’ is to ‘d’)

By “predicting time series” what exactly do you mean? Could you give me an example?

There’s totally someone who made analogies during a Nengo Summer School, but I don’t know where the code for that went. @tcstewar do you know?

gidmeister · May 19, 2017, 10:03am

Sean,
Thanks for your reply, but you forgot to put a link to the example of the vocabulary space. What is the “GUI” that you refer to above?

A time series is any stream of data that you are trying to predict. For instance, in the stock market, you might get a steady stream of data. Or you might be trying to predict historical trends of a commodity, or of fashions, or economic cycle, or whatever else you can think of. Since finding convolution-tranforms between successive items in the “ravens progressive matrix” works, I suppose it would be worth trying to see if you could do that with some of these series, though probably the convolution would not stay constant, but would gradually change over time as well. And in reality, most time series are affected by many factors, for instance stock prices are affected by interest rates, though I know people who try to just look at a series of financial data and find an intrinsic pattern. Probably some of the relations are nonlinear as well, and I doubt convolutional transformations can capture that.

The XOR example I gave at the top is because I’m really not 100% sure I understand the basics.

Seanny123 · May 19, 2017, 11:28am

Whoops! Here are the links:

The NEF can predict the output of time series on it’s own, but as for SPA… I think the reason I’m having a hard time answering your questions is I’m not sure what the difference between:

Inference (which “Raven Progressive Matrice” does)
Recall (for example, given a list of items shown over time and a prompt, ask which item came after, which Spaun is capable of doing)
Analogy (which you mentioned before)
Any other type of reasoning

Would you mind explaining further?

Given I can’t answer your question about XOR, maybe working through the tutorials in the Nengo GUI will help you out? Alternatively, maybe check out Chris’ course notes. They should help you understand how neurons represent an input signal and change it to an output. Please let us know if it doesn’t help!

jgosmann · May 19, 2017, 3:36pm

Yes, the standard basis are the x and y axis in the two-dimensional case. And yes, the third neuron with the [1, 1] encoder/basis vector given the ability to represent non-linear functions of both input dimensions. Note that you can decode a non-linear functions aligned with x and y even without the third neuron (for example x^2 and/or y^2 could be decoded). The third neuron allows you to decode a one-dimensional non-linear function aligned with the [1, 1] vector. The (non-linear) functions decoded along these different vectors then get linearly combined into a single output.

Yes, because there are neurons coding for both input dimensions. Though, it will not be a binary yes or no. As the encoding vectors are shifted to be more aligned with the standard basis, the approximation of XOR will get worse. So the choice of encoding vectors can influence the accuracy of the decoded function. This is something, we also see for multiplication and I explain that particular case in more detail here. In short, for multiplication it is better to choose all encoding vectors as [1, 1], [1, -1], [-1, 1], [-1, 1] instead of randomly as usually done with the NEF. (There is an even better method to do multiplication wtih the NEF explained in the link, but that is not relevant here.)

Yes.

In the default case yes. However, there are ways to solve for sparse decoding weights to avoid all to all connectivity, if desired.

This depends a bit on how you construct and assign vectors for the different concepts, but that is certainly possible.

gidmeister · May 19, 2017, 3:52pm

Thanks for both your answers, jgosmann and Sean. To Sean’s question about my questions, I’m pretty vague on them myself, but what I see from reading how NEF handles the Raven Matrices, it does the following:

It is given a sequence (maybe 1, then 11, then 111).
It finds a transformation that makes ‘1’ become ‘11’
It finds a transformation that makes ‘11’ become ‘111’.
It averages the two transformations that it just found mathematically.
Now, you can ask it to apply that resulting averaged transformation to that last number ‘111’ and it will, with luck, produce ‘1111’.

This is interesting, because I would have thought, before reading about this, that you really have to have a understanding of the sequence to predict what comes next. Instead it seems you just find a transform (specifically a binding) that takes you from one term in the sequence to another. No understanding is involved at all.

So my question was, since life is full of sequences, whether in economics, or the stock market, or history, or even thoughts, that some method of successive transforms might be applicable in many domains. I doubt that this would be a good method for such sequences, but I thought I’d ask. And in fact, after asking, I saw a mention of binding and ‘analogies’ in Chris Eliasmith’s book, which I am in the middle of (a mathematician friend of mine sent me an article about Nengo and that prompted me to buy the book).

I’ll look at your links next, and study the answers.