Background
I’ve been reading the “Technical Overview of the Neural Engineering Framework” by Terry Stewart, and it filled in some gaps in my understanding (I’m a newbie), but I want to make sure that I understand what he says there.
XOR and the NEF
In the paper, he mentions:
Importantly, this means that nonlinear functions can be computed with a single layer of connections –
no back-propagation of error is required. This includes not only the classic XOR problem, but also
more complex functions such as multiplication, trigonometric functions, or even circular convolution.
How do you solve the XOR problem with the NEF?
XOR has a truth table as follows:
0 0 | 0
0 1 | 1
1 0 | 1
1 1 | 0
(first 2 columns are inputs, last column is outputs).
To solve this, our group of 3 neurons could each have 2 incoming weights.
So they might have:
Neuron A: 0 1
Neuron B: 1 0
Neuron C: 1 1
So combined, this 3 member neural group give us a vector that represents the output.
For instance, the case where the input is 0, 1
it would produce a vector A=1, B=0, C=0
the case where the input is 1, 1
would produce a vector A=0 B=0 C=1
(Am I correct so far?)
The encoding equation is:
In the article, it says this: "if the $e_i$ values are all aligned with the standard basis vectors, then nonlinear functions of multiple variables cannot be computed (this is the case for standard connectionist models, which is why they require multiple layers)."
I’m wondering what ‘standard basis vectors’ are here. Are they 0 1
and 1 0
(the x and y axis)? If so, is that third neuron of 1, 1
an extra basis that gives more ability to represent non-linear functions?
Now lets suppose we want to recreate the input that came into this neuron triplet, and pass it from the neuron triplet to another group of neurons, this group having 10 neurons in it.
Suppose none of them have an encoding vector of 1,1
(there is no equivalent to neuron C). But some may have encoding vectors such as (0.5, 0.7)
, (-0.3,0.2)
etcetera
Would they be able to encode XOR?
The article also says:
The method of representation used in the NEF also allows you to add values by simply feeding two inputs into the same group of neurons. If you connect group A to group C with connection weights that compute f(a) and if you connect B to C with connection weights that compute g(b), then the neural group C will end up with the activity pattern that represents f(a)+g(b). This is a consequence of the linear representation given in Equation 1, and is vital for constructing larger networks.
I’m not sure I picture this exactly.
First, should A and B have the same number of elements?
Secondly, is it true that for every neuron in C, every neuron in A and B sends a connection to it?
Semantic Pointer Architecture
Finally the article talks about Semantic Pointers (SPA’s). It says that:
Furthermore, the vectors maintain similarity, so that if “pink” and “red” have similar vectors, then “pink square” and “red square” will also have similar vectors. This feature allows inductive reasoning over complex patterns. For example, our neural model of the Raven’s Progressive Matrix task (a standard intelligence test where participants are given 8 visual patterns in a 3x3 grid and are asked to determine what pattern should be placed in the missing square) works by forming the vector representation of each pattern and computing the average transformation that that will take one pattern to the next
This sounds fascinating. It seems that you are creating a space where objects that have similar semantics, stay together.
Would this imply so that red wood square box
is more similar to red square
than pink square
? In other words, a hierarchy of related concepts would be close to each other?
Raven’s Progressive Matrices
The Raven’s Progressive Matrices leads to another question.
Suppose you could have 2 sequences that start the same way:
1, 11, 111, 1111
and
1, 11, 21, 31
The first sequence just concatenates 1’s to the end, the second sequence adds 10 by addition. It seems that by averaging the transformation between the consecutive patterns, you would get different predictions for the next object in the set.
If that is how it works, then it seems that you should be able to:
- Predict time series
- Predict analogies (‘a’ is to ‘b’ as ‘c’ is to ‘d’)
Has that been tried? Would it work?