The relation of bound/summed structures to their components

gidmeister · May 5, 2017, 2:14pm

I have finished reading an article on NEF and SPA, and since I don’t want to bombard this forum with questions without doing as much reading as possible first, this is my last set of questions for a while. Some of these questions are not precise at all, but any answer is appreciated.

Questions:

There seem to be 3 ways of combining vectors: summing them, binding them, and doing a dot-product of them. Binding and summing together make a structure. You can interrogate that structure for its parts by binding it to an inverse vector. But without that inverse, are there similarities of the resulting vector to the concepts that make it up? And if that structure has the same number of dimensions as the components that make it up, is it losing much of the components information?
Suppose we want to know if a letter is in the visual field of Spaun. According to the article, we could create a generic concept for letter, by adding the vectors of all 26 letters in the English alphabet. This sum is supposedly similar to any one of those letters, so if you take that sum of vectors, and then do a dot product with whatever is currently in the visual system, you will get a measure of similarity to the concept of ‘letter’.
This doesn’t seem to make sense with low dimensional vectors. For instance, a 5 bit vector could represent any one of 26 letters, but adding all of them would give a sum that was not particularly similar to any one of them. Is the high dimension space of the vectors you use the reason why the sum is similar to each of its parts?
If we have a sequence where vector P1 leads to P2, and P2 leads to P3, does that mean that there always exists a transformation T such that P1 bound to T gives P2? Before this example was given in the article, ‘binding’ meant associating syntax roles with word vectors using circular convolution. But in this example, Neither P1 nor T needs to be a word, or a syntax role. So binding can be an abstract relation that leads from one step in a pattern to another, or it can be an arbitrary way of creating a syntax role for a concept? What else could it be used for?
I didn’t understand the basal ganglia example. Suppose you have 5 alternative actions you are considering. The one you selected stops inhibiting the basal ganglia. But the other four are in memory too, and they are still inhibiting the basal ganglia. what am I missing?
A general question: What is it about the low pass filter of the spikes that arrive at the dendrites that allow for creating oscillators and attractors?

finally:
6. where are the course notes? The course is not offered online, I assume…
Thanks.

jgosmann · May 5, 2017, 3:31pm

To answer some of your questions:

For a circular convolution (the operation we usually use for binding) the resulting vector will be dissimiliar to both bound vectors.
For a sum the resulting vector will be somewhat similar to all the constituents (by how much depends on whether the vectors will be normalized after the sum).
The dot product is more of a comparison operation. It tells you how similar two vectors are. Because they are reduced to a single scalar, you cannot really compare that single number to either of the original vectors.

In theory, I believe (not entirely sure), no information is lost because the vector components are real numbers with infinite precision. (Though, there are some specific vectors which act as an absorbing element and would destroy all information. Like multiplying with 0.)

In practice, however, neurons can represent the vectors only with a limited precision. Thus, in a real system information will be lost. But this is not necessarily a problem as long as enough information is retained to recover the original vectors with a cleanup memory.

I assume you mean assign a number to each letter and encode that number as a binary vector? In that case you many vectors are already highly similar as measured by the dot product. Adding those vectors together will give a vector similar to the summands, but it will probably also be similar to other vectors in this representation.

In the SPA we usually use vectors that are almost orthogonal. To encode the 26 letters 26 dimensional vectors with a single dimension set to 1 for each letter could be used. In that case each vector pair is dissimilar (orthogonal in fact) and the sum is similar to all of its constituents. Note that we don’t need perfect orthogonality for this to be still approximately true. This allows us to fit more almost orthogonal vectors into the space than there are dimensions.

Given two arbitrary vectors P1 and P2 and assuming that P1 is not the absorbing element there should be a transformation from P1 to P2 I believe. If that transformation also brings you from P2 to P3, depends on how P1, P2, and P3 are constructed.

http://compneuro.uwaterloo.ca/research/syde-750.html

arvoelke · May 5, 2017, 6:17pm

Not sure which questions are remaining, but here’s another response. Let us know what we’ve missed.

Oscillators, attractors, and their cousins, all have the same thing in common: they are described by differential equations of the form $\dot{x}(t) = f(x(t))$, where $\dot{x}(t)$ is the derivative (with respect to time) of the system’s state vector $x(t)$.

If we integrate both sides, we see that $x(t) = \int_0^t f(x(t’)) , dt’$ (plus some initial condition). And so here’s the important bit: the state of the system evolves by integrating some function over time. And a synapse is very much like an integrator over time. In fact, the lowpass filter is often referred to as a leaky integrator, because its dynamics are described by taking an integrator and including a leak term (an exponential decay). In the Laplace domain, it is the difference between $\frac{1}{s}$ and $\frac{1}{\tau s + 1}$ for the integrator and leaky integrator respectively.

In other words, the synapse lets us implement these systems because it performs the role of integration that is needed to evolve $x(t)$ over time. To compensate for the leak, Principle 3 is a proof that the synapse must be driven by $\tau f(x) + x$ instead of $f(x)$ to account for the fact that we’ve switched out the integrator for a lowpass filter.

gidmeister · May 5, 2017, 8:34pm

That answers all but one of the questions and gives me lots of food for thought. The question that wasn’t answered was on the basal ganglia and action selection. (Spaun’s action selection is based on this) I understand that the winning selection has an inhibitory output of zero, while the other potential actions are inhibited, but I don’t understand if this is a gating mechanism - where the actions are competing to go through some kind of bottleneck, or if something else is happening. Thanks for the help.

arvoelke · May 6, 2017, 6:19pm

Someone like @tcstewar may need to correct / elaborate my response since I’m not an expert on the BG / action selection model. This response is coming from my memory of the course / HtBaB book from a few years ago…

I think the part that you may be overlooking is that multiple populations of neurons are being inhibited. The neurons that become disinhibited are the ones that drive the response corresponding to the winning action. All other neurons remain inhibited to avoid driving any other actions.

The utilities of the potential actions compete against each other (in a sort-of bottleneck fashion) in a winner-take-all network. The one that wins will enable the flow of information via disinhibition of the appropriate neurons (in a gating-like fashion) to drive the corresponding response.

tcstewar · May 6, 2017, 6:39pm

The idea with the basal ganglia is that there are 5 inputs that are how good each action would be right now. the point of the basal ganglia is just to determine which of those 5 inputs is the largest. The output from the basal ganglia inhibits the actions that are not selected, leaving just the one action remaining.

The details of the basal ganglia model used are from (Gurney, Prescott, Redgrave, 2001): https://www.ncbi.nlm.nih.gov/pubmed/11417052 We made a spiking LIF neuron version of that model, and that’s what’s used in these SPA models, but conceptually it’s the same idea. You might also find these two short papers useful: http://compneuro.uwaterloo.ca/files/publications/stewart.2010a.pdf http://compneuro.uwaterloo.ca/files/publications/stewart.2010.pdf

Terry