Large numbers of sample points use a lot of memory

tcstewar · January 16, 2018, 3:24pm

I’m working on a project with Florian that involves using nengo solvers and a lot of sample points. When we start getting up around a million sample points, memory usage seems to explode. For example:

import nengo
import numpy as np

N = 750
D1 = 16
D2 = 2
S = 1000000

pts = np.zeros((S, D1))
target = np.zeros((S, D2))
pts[:,0] = np.sin(np.arange(S)*0.001)
target[:,0] = np.sin(np.arange(S)*0.001)

model = nengo.Network()
with model:
    ens = nengo.Ensemble(n_neurons=N, dimensions=D1)
    result = nengo.Node(None, size_in=D2)
    c = nengo.Connection(ens, result, eval_points=pts, function=target)
sim = nengo.Simulator(model)

Any suggestions? Is there an alternate solver that would be better in this situation?

arvoelke · January 16, 2018, 5:13pm

I believe this was the motivation for RandomizedSVD, which you can use courtesy of @Eric. I’ve tried it out with 10-20,000 points but never a million. I don’t think you can get around the memory requirement for the A matrix which is still substantial for your case.

tcstewar · January 16, 2018, 7:46pm

Hmm, is that a limitation in the current way we have nengo organized? Because we don’t actually need the A matrix – we could build up Gamma (i.e. np.dot(A.T,A)) by using a bunch of slices through the A matrix.

arvoelke · January 16, 2018, 7:55pm

Well, Nengo constructs the A matrix before it has even invoked the solver, and so you will need to do some low-level builder-hacking to modify this. Alternatively, if you have your own way of solving for the decoders you can use the new NoSolver solver to embed these decoders directly. This actually constructs a dummy A matrix that is only sized according to the number of neurons, and so your bottleneck will become however you chose your own decoders.

This may also be relevant: https://arvoelke.github.io/nengolib-docs/notebooks.research.geometric_decoders.html

I gave a talk on this notebook a year or two ago. It shows how to effectively get an infinite number of evaluation points while paying just a fixed cost dependent on the complexity of your tuning curves and desired function. You may be able to do something similar to trade costs if you can approximate the required integrals. You can think of this as building up np.dot(A.T, A) et al. directly.

tcstewar · January 16, 2018, 9:58pm

Oh, nice – I didn’t realize that NoSolver bypasses creating A… that makes a lot of sense, and means that I think we can solve this issue this way:

import nengo
import numpy as np

N = 750
D1 = 16
D2 = 2
S = 1000000

pts = np.zeros((S, D1))
target = np.zeros((S, D2))
pts[:,0] = np.sin(np.arange(S)*0.001)
target[:,0] = np.sin(np.arange(S)*0.001)

def compute_decoder_for_large_S(ens, pts, targets, block_size=10000):
    assert ens.seed is not None
    model = nengo.Network(add_to_container=False)
    model.ensembles.append(ens)
    sim = nengo.Simulator(model)

    N = ens.n_neurons
    gamma = np.zeros((N,N))
    upsilon = np.zeros((N, targets.shape[1]))

    offset = 0
    while offset < len(pts):
        sub_pts = pts[offset:offset+block_size]
        sub_targets = targets[offset:offset+block_size]
        _, A = nengo.utils.ensemble.tuning_curves(ens, sim, inputs=sub_pts)
        offset += block_size
        gamma += np.dot(A.T, A)
        upsilon += np.dot(A.T, sub_targets)
    dec = np.dot(np.linalg.pinv(gamma), upsilon)
    return dec


model = nengo.Network()
with model:
    ens = nengo.Ensemble(n_neurons=N, dimensions=D1, seed=1)
    result = nengo.Node(None, size_in=D2)
    decoder = compute_decoder_for_large_S(ens, pts, target)
    c = nengo.Connection(ens, result, function=lambda x: np.zeros(D2), 
                         solver=nengo.solvers.NoSolver(decoder))
sim = nengo.Simulator(model)

I’ll see whether that resolves this for the case we’re looking at… Thank you!

arvoelke · January 17, 2018, 7:38pm

Nice. I hadn’t thought of that, but this is a really great way to save memory ($\mathcal{O}(nb)$ rather than $\mathcal{O}(ns)$ where b is the constant block size and s is the number of evaluation points). It should be mathematically equivalent. The only real drawback is numpy can no longer batch all of these vector operations together, and so it may be a bit slower. Most of that time should be reclaimed if you are no longer paging RAM to disk.

In general I can see this being a very useful addition to Nengo for some really complicated high-dimensional function that needs lots of sample points and neurons. You could do this the same way that NoSolver taps into the builder: https://github.com/nengo/nengo/blob/93c7b6175121118243e9b1e9aac5cd534d95fa09/nengo/builder/connection.py#L167.

Eric · January 17, 2018, 9:11pm

I did something like this here:

arvoelke · January 17, 2018, 11:00pm

Note the above doesn’t compute the gamma and upsilon matrices in blocks as @tcstewar also did. Still, it’s a similar idea that is useful when just the neuron model is memory-intensive as you say.

Relatedly, I finally got around to implementing FORCE in Nengo. It is interesting that the suggested learning rule (recursive least-squares) is actually computing the gamma matrix, online, one evaluation point at a time. This is conceptually very similar to the above compute_decoder_for_large_S function, but with a block_size of 1, and with the computations done online as the sample points arrive. From a user-perspective, it’s like using PES, but it will arrive at the same L2-optimal solution as Nengo. You can follow my progress here: https://github.com/arvoelke/nengolib/pull/133.

Large numbers of sample points use a *lot* of memory

Large numbers of sample points use a lot of memory