Gyrus - running time

Hi !
I’m trying to use gyrus to calculate inverse kinematics, using the jacobian matrix.
In order to use the jacobian, I need (among other things) np.linalg.pinv
but gyrus doesn’t support that so I used the example from here in my program.

I don’t think my program is too complicated but It is taking FOREVER to run a 1-second simulation (around 40 minutes)

I’ve tried to diminish the number of neurons from 1000, but it’s far less accurate and still, takes a long time.

a bit from the code:
I define the end-position as A, and the angles to get to that point as q

A = np.array([0.47833606, 0.46394772, 0.39643308], dtype='float')
q = np.array([-0.69066467, -0.20034368,  0.28437363,  0.00342465,  0.10304996], dtype='float') 

then, I use the function gyrus_calc to calculate q_hat according to A only, using the jacobian matrix

def gyrus_calc(A, q_hat, dt, synapse=None):
    """Compute q according to Jacobian matrix."""
    J = gyrus.stimuli(calc_J(q_hat)).configure(

    J_pinv_hat =gyrus.stimuli(np.zeros_like(calc_J(q_hat)).T)

    ## np.linalg.pinv(calc_J(q_hat)) ##
    J_pinv = gyrus_inverse(J, J_pinv_hat, dt)

    return q_hat.integrate_fold(
        integrand=lambda q_hat: dt * / 1e-3,

while gyrus_inverse function uses gradient function to implement np.linalg.inv

def gradient(A, M):
    """Compute the gradient of M approximating inv(A)."""
    I = np.eye(A.shape[1])
    return 2 * (M @ A - I) @ A.T

def gyrus_inverse(J, J_pinv_hat, dt, synapse=None):
    """Compute the inverse of J by gradient descent from J_pinv_hat."""
    return J_pinv_hat.integrate_fold(
        integrand=lambda J_pinv_hat: -dt * gradient(J, J_pinv_hat) / 1e-3,

In order to run the simulation I run op which I defined as:

op = gyrus_calc(

Is it justifying a 40-minute run?

Thanks a lot,

Could you share calc_J and calc_T, or some suitable stand-in for those functions? I’m having trouble trying to run this code.

One thing to note is that it’s using 1000 neurons per nonlinearity, and so the total number of neurons is going to be much (much) higher. You can see how many neurons by doing:

with nengo.Network() as model:


You can likely get a speed-up by running it on a GPU, by installing nengo_dl and passing simulator=nengo_dl.Simulator to the method.

There is also an optimized version of this network at the bottom of the example you linked, where you can get away with using just 1 parabolic spiking neuron per nonlinearity, but this may not be biologically plausible if you wish to be modelling some particular response curves.

Yes, the total number of the neurons really was much (much) higher…
For some reason, the GPU did some problems, but I saw the network at the bottom of the example, which really did help to diminish the time!

BTW - in you guys trying to:

from IPython import get_ipython
from IPython.display import display, HTML 

but is practically empty, making it a problem when trying to implement InlineGUI(model) from nengo_gui.ipython import InlineGUI

I really appreciate the help and the quick response,

What error did you see when trying to use the GPU?

Here are some speed differences I get for the final network in the spiking matrix inversion example:

  • 52 seconds for the CPU
  • 38 seconds for the GPU (replace nengo.Simulator(model, optimize=False) with nengo_dl.Simulator(model)
  • 32 seconds for the GPU without the Adam synapse (set synapse=None in go call, noting that other parameters like dt should probably be adjusted as well to prevent the dynamics from being unstable with Euler’s method)

So not the biggest improvement, but it’s something. If I could see more of the code I might be able to help point out ways to reduce the usage of neurons or time-steps or similar. As it stands, if you have millions of neurons being simulated for a large number of time-steps time then there is going to be a lot of data being generated and collected. Typically you want to start small and idealized (for example you can use neuron_type=nengo.Direct() to avoid using neurons and just computing the nonlinearities ideally) and then scale things up when you’re ready to run your ‘final’ experiments.

What error do you see when you try to do this import? This is deprecated and so technically I should have been doing from nengo_gui.jupyter import InlineGUI (filed an issue). Nonetheless it is still supposed to work, as one is just a redirect to the other.

A quick update about the speed differences I was seeing above. One thing that I hadn’t tried before was to switch nengo.Simulator(model, optimize=False) to nengo.Simulator(model, optimize=True). I disabled the graph optimization because that step was taking a long time, but if you reenable it I’m getting 27 seconds now for the CPU (without the Adam synapse) – so that might actually be the fastest option for now in this case. I’ve filed another issue here to investigate this a bit more generally, as I agree this does still seem much slower than it could be.

So my colab notebook does not seem able to detect a GPU, but this is not a nengo issue I’m sure…
I tried several solutions but so far no success, anyway as I said, this is not a nengo issue so you don’t have to concern yourself with it :slight_smile:

Apparently, the InlineGUI issue was local, so consider it as not an issue on my part.

Regarding the running time, when I’m implementing the network as shown in the last part of the example we talked about, it takes only a few minutes (2-3) to run, I guess it’s because of the small number of neurons (right?)

When using neuron_type=nengo.Direct() my program works well, but when using LIF() or RegularSpiking() (like in the example we love) the results are very not accurate.
I’m still trying different configurations for the ensembles that might help (encoders, intercepts, etc.)

If you want, you can take a look. It has the GPU not detected note, and Individual examples for using the different types of neurons.

Thanks a lot,

Hi, an update about the this question:

I saw that even if I use 1000 neurons (and not 1 as in the example) the simulation takes about the same time (a couple of minutes). So I’m not sure I understand how it works :confused:

Also, I manage to find a configuration that almost implements the inverse kinematics well enough (close to the Direct version), but I’m still missing something because:

  1. I need A LOT of neurons for this model, even though I’m using the “Optimizing Spikes” model.
  2. The error is still pretty big, the curve is not smooth enough, and lowering the learning rate (dt) cost a lot of time.

This is the last version of the code,
Appreciate the help!