Discrete Delay in Nengo DL

Hi everyone,

In my research, I am trying to use SNNs for model predictive control (MPC). Currently, I am trying to wrap my head around Nengo and Nengo DL to see if they are suitable for this task.

In my prediction network, I would like to predict a future system state that lies Δt seconds in the future. Please note that Δt may be much larger than the dt parameter of the Nengo simulation (e.g. 0.02 vs 0.001). As the model is ought to be auto-regressive, the prediction should be used as input of another node after Δt. Based on my understanding, this requires the output to be “stored” in some buffer of a node and then released later. I have come across some implementations for Nengo that allow such a “discrete delay”. I think the delay is considered “discrete” because in such cases the signal is held for int(dt_delay / dt_sim) steps. However, both implementations I found do not work “out of the box” with Nengo DL, and I have no idea (but would like to know) if there is a way to do any of this also on a Loihi chip.

The implementations I came across are this one by @tcstewar:

class DiscreteDelay(nengo.synapses.Synapse):
    def __init__(self, delay, size_in=1):
        self.delay = delay
        super().__init__(default_size_in=size_in, default_size_out=size_in)

    def make_state(self, shape_in, shape_out, dt, dtype=None, y0=None):
        return {}

    def make_step(self, shape_in, shape_out, dt, rng, state=None):
        steps = int(self.delay/dt)
        if steps == 0:
            def step_delay(t, x):
                return x
            return step_delay
        assert steps > 0

        state = np.zeros((steps, shape_in[0]))
        state_index = np.array([0])

        def step_delay(t, x, state=state, state_index=state_index):
            result = state[state_index]
            state[state_index] = x
            state_index[:] = (state_index + 1) % state.shape[0]
            return result

        return step_delay

...

nengo.Connection(predicted_future_state, predicted_current_state,
                 synapse=DiscreteDelay(self.t_delay, size_in=self.state_dim))

I also tried this implementation from nengolib (taken from the nengo3 branch): arvoelke.github.io/nengolib-docs/nengolib.synapses.DiscreteDelay.html
which is used similarly as

nengo.Connection(predicted_future_state, predicted_current_state,
                 synapse=DiscreteDelay(int(self.t_delay / self.dt)))

Anyways, both give me error messages I must admit I do not fully understand. I have come up with my own working version that I currently use in my project with Nengo DL, but it is VERY hacky and relies on some assumptions that I am not sure (will) always hold.

class DelayNode:
    def __init__(self, steps):
        assert steps >= 0
        self.steps = steps
        self.hist = [[]]
        self.last_t = 0.0

    def reset(self):
        self.hist = [[]]
        self.last_t = 0.0

    def step(self, t, x):
        if self.last_t > t:
            self.reset()
        elif self.last_t < t:
            self.last_t = t
            self.hist.append([])
        if len(self.hist[0]) == 0 and len(self.hist) > 1:
            self.hist.pop(0)

        self.hist[-1].append(x)
        if len(self.hist) < self.steps:
            return np.zeros(x.shape)

        return self.hist[0].pop(0)

...

model.delay = DelayNode(steps=int(self.t_delay / self.dt))
delaynode = nengo.Node(model.delay.step, size_in=self.state_dim)
nengo.Connection(predicted_future_state, delaynode)
nengo.Connection(delaynode, predicted_current_state)

Firstly, I observed by simply printing some toy inputs to the network, that the node input is processed not in batches but in a sequence that (I think) follows the same pattern: e1_t0, e2_t0, ... en_t0, e1_t1, e2_t1, ... en_t1, ...
Then, this node infers the batch size n based on the point when t increases to any t > last_t. However, assuming that the model “has knowledge” of the batch size to come is also not really great…
Finally, the node automatically resets its buffer when it observes a t < last_t as this seems to indicate that a new batch of data is being processed. I would have liked to use t == 0.0, but it seems that the simulation always starts at t = dt (not sure why?).

My key question is if there are other solutions to my problem that I am unaware of, but perhaps some of you have worked with in the past, in particular in a Nengo DL context? Anyways, any input here is very welcome as I am convinced this is not the best solution for my problem.

Thank you for your kind assistance and best wishes,
Justus

For those interested I wanted to provide the error messages I get when using abovementioned solutions in my code:

Terrys implementation in a Nengo DL context results in this error stack.

The nengolib implementation in the same context results in this error stack.

Hi @jhuebotter,

In regards to your questions:

I can confirm the errors you are encountering are replicable on my system. Having played with it for a while, I can provide some insight as to what is happening. In the case of @tcstewar, I believe that to get it to work, you’ll need to write a custom synapse object for NengoDL that supports the delay operation. The basics of which are outlined in this code. However, as I will explain further down, I don’t think this is the approach you will want to take. As for @arvoelke NengoLib library, the code seems to be a little outdated and needs to be updated to support the latest version of Nengo. I spent some time trying to do this but the library itself is quite complex, and the effort to fully update (and test) it for the latest version of Nengo was too time consuming.

To be frank, the simplest approach for your use case is to do exactly what you have done, which is to create a nengo.Node with a custom function to perform the delay. This approach is fairly simple and work in both Nengo, NengoDL, and NengoLoihi with only minimal (if any) changes to the model.

I’m not entirely sure what you mean by this. Can your provide some example code that demonstrates this?

Once again, not quite sure what you mean here. The way Nengo nodes work is that at every timestep, the function passed to the node is called. If by batch you mean another NengoDL simulator object creation or run, then what you are observing is correct.

This is indeed the case for Nengo. When a Nengo model is created (i.e., after the with model: context is closed) all of the objects in the model are evaluated at t == 0 to introspect things like input and output shapes. It is for this reason that when the simulation starts, it skips the t == 0 timestep (to avoid any residual data or effects happening for two t = 0 timesteps) and starts at t == dt.

In the standard Nengo simulator, it should be possible to create a Nengo process that does the delay function for you. The Nengo process is kind of like creating your own class, but there are some special operators assigned to processes that can speed up the simulation. In fact, a nengo.Process very similar to @tcstewar’s code can be created and used:

class DiscreteDelayProcess(nengo.processes.Process):
    def __init__(self, delay, size_in=1):
        self.delay = delay
        super().__init__(default_size_in=size_in, default_size_out=size_in)

    def make_step(self, shape_in, shape_out, dt, rng, state=None):
        steps = int(self.delay / dt)
        if steps == 0:

            def step_delay(t, x):
                return x

            return step_delay
        assert steps > 0

        state = np.zeros((steps, shape_in[0]))
        state_index = np.array([0])

        def step_delay(t, x, state=state, state_index=state_index):
            result = state[state_index]
            state[state_index] = x
            state_index[:] = (state_index + 1) % state.shape[0]
            return result

        return step_delay

However, when I tested this with NengoDL, I ran into some operator issues. I have contacted the NengoDL devs and am waiting to see if they can figure out what is going wrong.

If you want to implement this delay on the Loihi chip, it’s also another reason to go with the nengo.Node approach. As far as I know, the Loihi hardware (i.e., if you want to run it on the chip itself) does not support custom synapses. This means that to get the delay functionality, you pretty much have to use a nengo.Node to implement it. Just take note that nengo.Node objects are not run on the Loihi chip, but on your computer instead, so using these nodes may incur communication penalties (i.e., they slow down the overall simulation speed). Every time a signal has to go from off-chip to on-chip (or vice versa) extra time is needed to do that data transfer, so your model should be designed in a way to minimize the amount of communication on/off the chip.

As an example, if your network is laid out like this:

Node → Ensemble → Node → Ensemble → Node

it would have a larger communication penalty than something like this:

Node → Ensemble → Ensemble → Ensemble → Node

1 Like

Hi @xchoo,
Thanks once again for your time and kind assistance.

Sure can:

# import the necessary packages
import nengo
import nengo_dl
import tensorflow as tf
import numpy as np

class DoSomethingNode:
    def __init__(self):
        self.reset()

    def reset(self):
        self.last_t = 0.0
        self.batch_size_counter = 0

    def print_and_pass(self, t, x):
        # only prints given arguments
        print()
        if self.last_t > t:
            print("new batch assumed")
            print("detected batch size:", self.batch_size_counter)
            self.reset()
        if self.last_t < t:
            print("advanced 1 time step")
            self.last_t = t
        print("time:", t)
        print("input / output:", x)
        # optionally: do something interesting here such as the delay node
        self.batch_size_counter += 1

        return x

# define the model
def make_model(state_dim):

    model = nengo.Network()
    with model:
        model.input_node = nengo.Node(np.zeros(state_dim))
        model.Printer = DoSomethingNode()
        print_node = nengo.Node(model.Printer.print_and_pass, size_in=state_dim)
        nengo.Connection(model.input_node, print_node, synapse=None)

        model.output = nengo.Probe(print_node) # nengo_dl insist I have at least one...

    return model

# let's create a single batch fake data that remains
# interpretable to us once passed through the network

batch_size = 5
time_points = 3
state_dim = 2

data = np.ones((batch_size, time_points, state_dim))
for i in range(data.shape[0]):
    data[i] *= i

# now the first example in the batch has [0. 0.] for every time step
# and the second example has [1. 1.] and so forth...

dt = 0.001
t = dt * time_points

model = make_model(state_dim)
inputs = {model.input_node: data}
targets = {model.output: data}

sim = nengo_dl.Simulator(model, minibatch_size=int(np.floor(batch_size/2.)), dt=dt)
with sim:
    sim.compile(
        optimizer=tf.optimizers.SGD(0.0),
        loss={model.output: nengo_dl.losses.nan_mse}
    )
    sim.evaluate(inputs, {})

Running this should show what I meant by “sequentially processing each batch”. It goes something like:

for t in T:
    for x in batch:
        process_example(t, x)

where the deep learning packages I am used to using would go more like:

for t in T:
    process_batch(batch)  # batch being a tensor with dim [batch_size, ... ]

I hope the example above also shows what I mean here.

Great, I will make sure to give this a try in the coming days. :blush:

This is exactly what I thought and might be a problem at a later point… A silly “solution” that comes to my mind would be to not use a node at all, but instead, pass the “to be delayed” vector x into a series of int(t_delay/dt) ensembles. The trick would have to be that in total no transformation or filtering is done on the data in between (synapse=0.0 and transform=1??). Anyways, this is not a problem for now. But in the end, I do try to “predict the future” with Nengo networks and then use their own predictions as inputs to the model in an autoregressive manner.

Dear @xchoo,

I have just implemented your solution to my code and it works just fine. This gets rid of having to assume the batch size as explained above and as a bonus, your code runs slightly faster too. However, I had to make a slight adjustment by indexing result at 0, as I would otherwise get an index error down the line:

class DiscreteDelayProcess(nengo.processes.Process):
    def __init__(self, delay, size_in=1):
        self.delay = delay
        super().__init__(default_size_in=size_in, default_size_out=size_in)

    def make_step(self, shape_in, shape_out, dt, rng, state=None):
        steps = int(self.delay / dt)
        if steps == 0:

            def step_delay(t, x):
                return x

            return step_delay
        assert steps > 0

        state = np.zeros((steps, shape_in[0]))
        state_index = np.array([0])

        def step_delay(t, x, state=state, state_index=state_index):
            result = state[state_index]
            state[state_index] = x
            state_index[:] = (state_index + 1) % state.shape[0]
            return result[0]

Thank you again for your help. I have marked the issue as solved! :slight_smile:

1 Like

If I understand correctly, this is because of the different natures of the simulations in NengoDL and other ML applications. ML networks are typically non-temporal. The network is evaluated as a whole, from input to output in one “timestep”. However, in NengoDL, the models have a temporal nature where it takes time for information to propagate through the network.

The issue with this approach is that typically, having a long chain of ensembles introduces a lot of extra noise into the system (because of the spiking nature – if you are using spiking neurons). It also becomes an extreme waste of resources depending on how long the delay is. As an example, a 0.5s delay with a 0.001s timestep is 500 extra ensembles (which if you allocated ~30 neurons each, is an extra 15,000 neurons)

Yes and no, I am also thinking of RNNs with LSTM or GRU units. While it is true that they do not relate to any physical units of time in seconds, they operate on sequential data by processing them in discrete steps - not unlike Nengo. I agree, however, that there is a difference in that Nengo does not immediately pass on any computed information, if the synapse connecting two nodes or ensembles is not None. This “buffer” or slight delay is a key difference from other approaches in the DL domain.

I agree, this sounds like a bad idea. Perhaps some people that have more experience with the memory and communication bottlenecks of neuromorphic hardware have a better idea of how to briefly delay neuronal output in the range of a few to a few hundred milliseconds? Some “axon time delay” parameter, that does not necessarily introduce a lot of extra noise at the expense of large computational resources?

That is correct, yes. :smiley:

Specifically for the Loihi hardware, @Eric might have some suggestions. But, the primary issue is that the hardware itself has to support such an operation. As far as I know, the Loihi hardware is designed with a fixed set of supported synapses (I believe it’s the exponential and alpha synapse?? although, it could just be the exponential synapse), and custom synapses are just not possible to do on the hardware itself. This means that such functionality will have to be implemented off-chip, and from there, it’s just an issue of mitigating the communication delay between the hardware and the host PC.

Loihi does have support for synaptic delays, but it’s not something that we currently support in NengoLoihi. So if you want to do this in hardware on Loihi, you’ll either have to add this functionality to NengoLoihi, or look at doing it purely through the Intel APIs (NxSDK, NxNet, Lava, etc.). Even if you hope to add this to NengoLoihi, I would do a basic implementation in NxSDK first, just to make sure you have the basic commands down (this should also help give you some idea where you need to make modifications in NengoLoihi). I guess the other sort of hybrid is to build your network with NengoLoihi, then look at taking the nxsdk_board object and adding in the delays after the fact.

1 Like

Thank you for your insight @Eric. I have no experience with NxSDK, but I might (have to) take a look at this later. We were hoping to use Loihi 2 in the years to come, but it seems that lava and a potential higher-level Nengo connection is in the very early stages of development (if at all?). So far, I am not exactly sure what hardware and software we can / have to work with here in the future.

I do have another question regarding the DiscreteDelayProcess by @xchoo :
As mentioned before, I am trying to use the predictive network that the delay node is part of for planning in model predictive control. What we would like to do goes something like this.

  1. Initialize the model object and a nengo_dl.Simulator object and load pre-trained parameters
  2. Provide a true action and state to the model to process with sim.predict(x, stateful=True)
  3. Provide a batch x of size n (hypothetical) action sequences to the model to predict a set of (hypothetical) state trajectories with sim.predict(x, stateful=False)
  4. A different method will evaluate the predictions and choose an action to perform in some environment which yields a new state
  5. Fill a whole batch x of size n with this action state pair (just repeat n times)
  6. Repeat steps 1 - 4 until some end criterium is met

I hope this is somewhat understandable. The idea is that the model should predict several state trajectories into the future while also iteratively updating its internal state only with information that actually occurred. This works as expected in a network, where no custom processes have an internal state. However, our network has the delay node as well as an LDN process - both of which are stateful. Here, the stateful=False does not lead to these states being reset to the last known state after calling sim.predict(). I have tried to work around this issue by making the state and state_index attributes of the delay process (self.state = ...) inside the make_step function, so that I could retrieve it from the “outside”. However, when I do so, I obtain what I believe is the state of a single example in the batch, but not the state for all examples in a batch. The same is the case for the LDN.

A thinkable workaround would of course be to keep a history of occurred states and actions and instead of returning to the last known internal network state - completely rerun the model on the entire history before every prediction. However, as some of these sequences are many seconds long and the processing time increases with every additional step taken, this may render this entire approach a bit pointless.

I have also tried to have one sim object that only advances in a stateful manner, and then at each step makes a temporary copy of itself and the contained model (state) to do the predictions into the future on. However, I did not succeed with this approach either. I must admit that in this process, I did not fully understand what the model parameter and then attribute of the simulator object is and what their intended use is…

Long story short, I was wondering if there is a working method known to you that allows me to still correctly use the stateful functionality with objects that are stateful, but are not neurons. I believe the solution to this to add get the desired states added to sim.tensor_graph.saved_state and I suspect that this should be possible, but unfortunately, the details of how to do so are a bit beyond me and I do not understand the details of the TensorGraph class defined here.

Once again I thank you very much for your time and kind assistance. Any hints are much appreciated.