We prefer to call these encoding weights or simply encoders to disambiguate them from other weights (this is something I mentioned briefly in my response to your last question).
Yes, That is right. The way you should think of it is we are trying to approximate $y = f(x)$ in the vector space. Here, $x$ and $y$ are scalars for simplicity. $x$ is your input. $f$ is the nonlinear function we're trying to approximate. $y$ is the output of the function given $x$.
For example, if $f(x) = x^2$, then for an input of $0$ you get an output of $0$, and for an input of $-1$ you get an output of $1$. In two dimensions, your function could be a mapping from points in the circle to other 2D points (for example, converting rectangular coordinates to polar coordinates). In general, the desired function is some (preferably, piecewise smooth) mapping from each vector in your input space to some corresponding vector in your output space.
It is important to keep in mind that this transformation function $f$ is being applied at each moment in time. So to think about what your function is in this case, you need to say how each point along the sine wave maps onto each point on "whatever something else" is at any given moment in time.
In general with principles 1 and 2 you can usually get any piecewise smooth function between two vector spaces, given "sufficiently many" neurons. Since the function $f$ is essentially a linear combination of tuning curves, the set of functions can be described more precisely as 'those that can be obtained by some linear combination of tuning curves'. But this can be hard to visualize, even in two dimensions.
With the inclusion of principle 3, you can get functions that compute across time, such as the integrator that we use to model working memory.
High dimensions. Basically, high-dimensional spaces are very weird and unintuitive. Mathematically, two random vectors in a high-dimensional space are guaranteed to be nearly orthogonal with exponentially high probability.
There are a lot of questions here and I so I kind of feel like I'm trying to condense a book chapter into a single paragraph. That said, each neuron has a decoder and each neuron has an encoder. These are vectors. When you look across an entire population of neurons, all of these vectors form an encoding matrix and a decoding matrix. But none of these decoders and encoders really 'exist' in the biological network. They are mathematical constructs used to understand what computations are taking place in the (latent) vector space. The actual weights are a product of the decoders of the presynaptic neurons and the encoders of the postsynaptic neurons, which provides scalar synaptic weights between each pair of these respective neurons. This equivalence is proven in the NEF text, course notes, and summarized as principle 2.