Computing the norm of a vector

tcstewar · February 13, 2019, 5:49pm

I’ve had a few different situations where I’ve wanted to compute the norm of the vector stored in a spa.State component. This is handy for doing things like detecting if a spa.State is empty. But, I’ve never been quite happy with the different ways of doing so.

However, I was just chatting with Ivana and we came up with a nice way to do it that I haven’t seen before, and so I thought I’d post it here:

import nengo
import numpy as np
import nengo.spa as spa

model = spa.SPA()
with model:
    D = 256

    model.a = spa.State(D, subdimensions=4)
    
    # add an output that computes the sum of squares for each sub-ensemble
    def sumsq(x):
        return np.sum(x**2)
    model.a.state_ensembles.add_output('sumsq', sumsq)
    
    # add the sums of squares together
    norm2 = nengo.Ensemble(n_neurons=50, dimensions=1)
    nengo.Connection(model.a.state_ensembles.sumsq, norm2,
                     transform=np.ones((1, model.a.state_ensembles.sumsq.size_out)))
                     
    # take the square root (note that this is optional, as for many uses norm2 is as good as norm)
    norm = nengo.Ensemble(n_neurons=50, dimensions=1)
    nengo.Connection(norm2, norm, function=lambda x: np.sqrt(np.abs(x)))

One sneaky thing in here is to set subdimensions=4. I find the accuracy degrades too much as you go above that.

Anyway, I just thought I’d post it here in case anyone needs this in the future!

arvoelke · February 15, 2019, 8:48am

Why 4? And does anything change when using nengo-spa as opposed to nengo/spa, due to @jgosmann’s optimizations surrounding the representation of subdimensions?

tcstewar · February 15, 2019, 1:10pm

I didn’t really explore much to figure out why 4 seems to work well. Less that than definitely works too. I’d also guess that you could improve the performance for >4 if you oversample the points near zero, or do a function more like 0 if np.sum(x**2) < 0.1 else np.sum(x**2) to flatten out that part of the space. (Although that 0.1 threshold should probably be more like 0.1/sqrt(D/SD) or something like that…)

And that’s a good question about nengo-spa! I don’t think there’s anything that should fundamentally change this, and I did a quick test so it seems to work, but that’d be worth looking at more rigorously.

jgosmann · February 15, 2019, 7:08pm

I used a very similar method before with the only difference that the state ensembles would only compute the element-wise square and all addition is done via the transform. Though, I more often I used a dot product network as it tends to be less boiler plate.

I’d be careful with that depending on where you need accuracy. If you wanted to determine whether the norm is close to 1, this might hurt the representation of those values.

arvoelke · February 20, 2019, 5:52pm

I think this poses an interesting set of theoretical questions with different methods and trade-offs depending on the situation.

For example, if one already knows the vocabulary $V$ ahead of time, and assuming that $x$ is always a positively scaled version of one of the vectors in $V$, then one could do it with only $|V|$ ReLU neurons (i.e., nengo.SpikingRectifiedLinear() or nengo.LIF(tau_rc=some_large_constant)).

The idea is to set the i’th encoder to the i’th vector in the vocabulary, $e_i = v_i$. The formula for the Euclidean dot product tells us that $e_i \cdot x = ||e_i|| cos \theta ||x|| = ||x||$ in the case of $cos \theta = 1$ and less otherwise (e.g., $-||x||$ in the case of $cos \theta = -1$). Hence the current into the i’th ReLU will be $||x||$ if $x$ is a positively scaled version of $v_i$, and otherwise something less. The output of the ReLU simply rectifies its current (which is fine, since norms cannot be negative). It then remains to apply a WTA mechanism, such as a first-to-spike scheme where the first neuron to become active inhibits all others. The summation across the entire population then reveals the active neuron, which decodes the exact norm (plus spiking variability).

To handle $x$ that are negatively scaled versions of items in $V$, the number of neurons can be doubled with each encoder’s sign flipped.

For cases where $x$ is a not a scaled version of $V$, there may still be a way to leverage near-orthogonality (over-completeness) of the basis to get an acceptable approximation.

Just more food for thought / discussion.