Basic questions on how NEF represents quantities and why it can produce arbitrary functions

I’m reading Large-Scale Synthesis of Functional Spiking Neural Circuits (by Terrence Stewart and Chris Eliasmith and ) and I’m getting the idea of how the NEF works, but I need some info to get over some ambiguities.
Lets suppose you have a group of neurons, each neuron being a direction vector in 2 dimensions. Some point east, some point west, some point Northwest, etcetera. The direction of a neuron in this case is just the vector of weights coming into a particular neuron. (I think). Now an input vector X is applied to each neuron (each neuron is getting the identical input). A neuron whose weights are similar to the input vector will fire more rapidly than one that is not similar. As a group you have a kind of fuzzy combination of neurons that represent the input - if we were to say it as a sentence, it might be “a little bit west, a lot Northeast, and a tiny bit south”. But here is my first stumbling block.

We could take linear combination of the outputs of those neurons. What are we accomplishing by doing that? The article seems to say that we can approximate any function by linear recombination of neural populations. (In fact, you don’t even need the sigmoid function that I’m used to in the backpropagation algorithm to produce a non-linear result). In the case of the compass-directions example suppose I tried to recreate the initial input. If the initial input was a vector pointing Northeast, I suppose I could recreate that in a neuron receiving inputs from one neuron whose preferred direction was North and another whose preferred direction was East and some other neurons with their own preferred directions. Is that the idea? In this case though, a transform from direction to something else would not make sense (unless maybe I was combining it with some other group that represented amplitude or some other property).
If I was trying to recreate a sine wave over time, then since a sine wave exists in two dimensions, I could use the group of vectors with random preferred directions and then recombine them. But if I wanted to transform that sine wave into something else, would a linear combination of the neurons it activated let me do that? What kind of functions could I convert it to?

I’ve gotten partway into the explanation of semantic pointers as well, and the confusion I have there is this. If the convolution of a syntax role and a word has to produce a vector as far as possible from the word (which is itself a vector) and from the role (which is also a vector), then how do you avoid the resulting pointers conflicting with some other word, or word-role? What is going to prevent collisions?

Finally, on a previous question in this forum, I got the following answer: "The interpretation of decoders as weights only holds roughly in the 1D case (after accounting for a possible change in sign for “on/off” neurons, a gain, and a bias current). In higher dimensions, the actual weight is a dot-product of a higher-dimensional decoder with a higher-dimensional encoder. This results in a conceptual distinction between the weight matrix and their factored forms used to understand the relevant computations. The course notes, and some of the Nengo notebook examples, are a good reference for understanding this in more detail."
I’ll get to the course notes, eventually, but how can a single neuron feed into a “higher dimensional decoder”? If a decoder isn’t a weight, is it a vector of weights? A matrix of weights? Where do these weights go?

We prefer to call these encoding weights or simply encoders to disambiguate them from other weights (this is something I mentioned briefly in my response to your last question).

Yes, That is right. The way you should think of it is we are trying to approximate $y = f(x)$ in the vector space. Here, $x$ and $y$ are scalars for simplicity. $x$ is your input. $f$ is the nonlinear function we’re trying to approximate. $y$ is the output of the function given $x$.

For example, if $f(x) = x^2$, then for an input of $0$ you get an output of $0$, and for an input of $-1$ you get an output of $1$. In two dimensions, your function could be a mapping from points in the circle to other 2D points (for example, converting rectangular coordinates to polar coordinates). In general, the desired function is some (preferably, piecewise smooth) mapping from each vector in your input space to some corresponding vector in your output space.

It is important to keep in mind that this transformation function $f$ is being applied at each moment in time. So to think about what your function is in this case, you need to say how each point along the sine wave maps onto each point on “whatever something else” is at any given moment in time.

In general with principles 1 and 2 you can usually get any piecewise smooth function between two vector spaces, given “sufficiently many” neurons. Since the function $f$ is essentially a linear combination of tuning curves, the set of functions can be described more precisely as ‘those that can be obtained by some linear combination of tuning curves’. But this can be hard to visualize, even in two dimensions.

With the inclusion of principle 3, you can get functions that compute across time, such as the integrator that we use to model working memory.

High dimensions. Basically, high-dimensional spaces are very weird and unintuitive. Mathematically, two random vectors in a high-dimensional space are guaranteed to be nearly orthogonal with exponentially high probability.

There are a lot of questions here and I so I kind of feel like I’m trying to condense a book chapter into a single paragraph. That said, each neuron has a decoder and each neuron has an encoder. These are vectors. When you look across an entire population of neurons, all of these vectors form an encoding matrix and a decoding matrix. But none of these decoders and encoders really ‘exist’ in the biological network. They are mathematical constructs used to understand what computations are taking place in the (latent) vector space. The actual weights are a product of the decoders of the presynaptic neurons and the encoders of the postsynaptic neurons, which provides scalar synaptic weights between each pair of these respective neurons. This equivalence is proven in the NEF text, course notes, and summarized as principle 2.