I’m working on a project with Florian that involves using nengo solvers and a lot of sample points. When we start getting up around a million sample points, memory usage seems to explode. For example:
import nengo
import numpy as np
N = 750
D1 = 16
D2 = 2
S = 1000000
pts = np.zeros((S, D1))
target = np.zeros((S, D2))
pts[:,0] = np.sin(np.arange(S)*0.001)
target[:,0] = np.sin(np.arange(S)*0.001)
model = nengo.Network()
with model:
ens = nengo.Ensemble(n_neurons=N, dimensions=D1)
result = nengo.Node(None, size_in=D2)
c = nengo.Connection(ens, result, eval_points=pts, function=target)
sim = nengo.Simulator(model)
Any suggestions? Is there an alternate solver that would be better in this situation?
I believe this was the motivation for RandomizedSVD, which you can use courtesy of @Eric. I’ve tried it out with 10-20,000 points but never a million. I don’t think you can get around the memory requirement for the A matrix which is still substantial for your case.
Hmm, is that a limitation in the current way we have nengo organized? Because we don’t actually need the A matrix – we could build up Gamma (i.e. np.dot(A.T,A)) by using a bunch of slices through the A matrix.
Well, Nengo constructs the A matrix before it has even invoked the solver, and so you will need to do some low-level builder-hacking to modify this. Alternatively, if you have your own way of solving for the decoders you can use the new NoSolver solver to embed these decoders directly. This actually constructs a dummy A matrix that is only sized according to the number of neurons, and so your bottleneck will become however you chose your own decoders.
I gave a talk on this notebook a year or two ago. It shows how to effectively get an infinite number of evaluation points while paying just a fixed cost dependent on the complexity of your tuning curves and desired function. You may be able to do something similar to trade costs if you can approximate the required integrals. You can think of this as building up np.dot(A.T, A) et al. directly.
Nice. I hadn’t thought of that, but this is a really great way to save memory ($\mathcal{O}(nb)$ rather than $\mathcal{O}(ns)$ where b is the constant block size and s is the number of evaluation points). It should be mathematically equivalent. The only real drawback is numpy can no longer batch all of these vector operations together, and so it may be a bit slower. Most of that time should be reclaimed if you are no longer paging RAM to disk.
Note the above doesn’t compute the gamma and upsilon matrices in blocks as @tcstewar also did. Still, it’s a similar idea that is useful when just the neuron model is memory-intensive as you say.
Relatedly, I finally got around to implementing FORCE in Nengo. It is interesting that the suggested learning rule (recursive least-squares) is actually computing the gamma matrix, online, one evaluation point at a time. This is conceptually very similar to the above compute_decoder_for_large_S function, but with a block_size of 1, and with the computations done online as the sample points arrive. From a user-perspective, it’s like using PES, but it will arrive at the same L2-optimal solution as Nengo. You can follow my progress here: https://github.com/arvoelke/nengolib/pull/133.