Tips on implementing Eligibility Traces in Nengo

I want to implement basically something similar to PES with eligibility traces, where the update looks like: dw = \alpha * td_error * e, where e is the eligibility trace of the synapse: e = e + \lamda \gamma a*global_error, where a is the activity of the presynaptic neuron. td_error would be the temporal difference error (reward) in this case. So the learning rule works on the weights directly.

The problem is that the td_error and the global_error(characteristic eligibility) should be broadcasted to the synapses directly (from my point of view currently), and I am having trouble in understanding the multi-dimensional error signals that are implemented separately from the master branch of Nengo.

Some pointers/advice would be appreciated, thanks in advance.

1 Like

If you checkout the pes-synapse branch of nengo from the git repo, and have nengolib installed, then you can implement eligibilty trace like functionality using the following:

#NOTE: To run this file you must be on the pes-synapse branch of nengo!! <-----

import numpy as np

import nengo
from nengolib.signal import z

dt = 0.001
model = nengo.Network()
model.config[nengo.Ensemble].neuron_type = nengo.LIFRate()
with model:

    state = nengo.Node(np.cos)
    target = nengo.Node(lambda t: -np.cos(t))

    ensA = nengo.Ensemble(n_neurons=100, dimensions=1)
    ensB = nengo.Ensemble(n_neurons=100, dimensions=1)

    outA = nengo.Node(size_in=1)
    outB = nengo.Node(size_in=1)

    nengo.Connection(state, ensA)
    nengo.Connection(state, ensB)

    # need to use a relatively low learning rate, because the delay means
    # that any learning effects won't be noticed for a while
    learning_rate = 1e-5
    num_timesteps_delayed = 1000  # default sim timestep is 0.001, so .1s delay

    # set up learning without the pre_synaptic filtering
    learn_connA = nengo.Connection(
        ensA, outA,
        learning_rule_type=nengo.PES(learning_rate=learning_rate))
    # connect in training signal with num_timesteps_delay
    nengo.Connection(outA, learn_connA.learning_rule, transform=1,
                     synapse=z**(-num_timesteps_delayed))
    nengo.Connection(target, learn_connA.learning_rule, transform=-1,
                     synapse=z**(-num_timesteps_delayed))

    # set up learning with the pre_synaptic filtering
    learn_connB = nengo.Connection(
        ensB, outB,
        learning_rule_type=nengo.PES(pre_synapse=z**(-num_timesteps_delayed-1),
                                     learning_rate=learning_rate))
    # connect in training signal with num_timesteps_delay
    nengo.Connection(outB, learn_connB.learning_rule, transform=1,
                     synapse=z**(-num_timesteps_delayed))
    nengo.Connection(target, learn_connB.learning_rule, transform=-1,
                     synapse=z**(-num_timesteps_delayed))

    compare = nengo.Node(size_in=5)
    nengo.Connection(state, compare[0])
    nengo.Connection(state, compare[1], synapse=z**(-num_timesteps_delayed))
    nengo.Connection(outA, compare[1], synapse=z**(-num_timesteps_delayed))
    nengo.Connection(target, compare[2])
    nengo.Connection(outA, compare[3])
    nengo.Connection(outB, compare[4])

    probe_target = nengo.Probe(target)
    probeA = nengo.Probe(outA)
    probeB = nengo.Probe(outB)

if __name__ == '__main__':

    num_seconds = 100
    half = int(num_seconds/2)
    sim = nengo.Simulator(model)
    sim.run(num_seconds)

    target = sim.data[probe_target]
    outA = sim.data[probeA]
    outB = sim.data[probeB]

    print('RMSE during second half of trial')
    print('RMSE A: ', np.sqrt(np.mean((target[half:] - outA[half:])**2)))
    print('RMSE B: ', np.sqrt(np.mean((target[half:] - outB[half:])**2)))

    import matplotlib.pyplot as plt
    plt.subplot(2, 1, 1)
    plt.title('Learning with a delayed error signal')
    plt.plot(sim.trange(), sim.data[probe_target], 'r--', lw=3)
    plt.plot(sim.trange(), sim.data[probeA], 'b')
    plt.legend(['target', 'without delay learning'])
    plt.subplot(2, 1, 2)
    plt.plot(sim.trange(), sim.data[probe_target], 'r--', lw=3)
    plt.plot(sim.trange(), sim.data[probeB], 'g')
    plt.legend(['target', 'with delay learning'])
    plt.show()

note: it runs for about 100s (which takes a little bit) so you can see the stability of the learning using the delayed neural activity vs learning on the non-delayed neural activity.

To elaborate a bit on the above, you can think of the PSC from the synapse (applied to the presynaptic activities and usually the error signal as well) as an eligibility trace. By default this PSC is an exponential decay (a lowpass filter), but as shown above you can make this filter whatever you want (see the pre_synapse parameter above, which can accept arbitrary nengo.LinearFilter objects). The above example uses a pure delay to create an eligibility trace that is a delta placed at a relative point in time. In general you could take any weighted combination of delays (for a finite eligibility trace) and/or higher-order filters (for continuous traces).

For simplicity sake I wrote down the rule as an operator for now, hope you can tell me if this makes sense:

class EPESSim(Operator):
    r"""
            Implements learning rule that supports eligibility traces with TD error.
    """

    def __init__(self,weights, activations, global_grad, delta, eligibility_trace, td_error, rule,tag=None):
        super(EPESSim, self).__init__(tag=tag)
        self.sets = []
        self.incs = []
        self.reads = [global_grad, activations, td_error]
        self.updates = [delta, eligibility_trace, weights]
        self.rule = rule

    @property
    def global_grad(self):
        return self.reads[1]

    @property
    def activations(self):
        return self.reads[1]

    @property
    def delta(self):
        return self.updates[0]

    @property
    def eligibity_trace(self):
        return self.updates[1]

    @property
    def td_error(self):
        return self.reads[-1]


    @property
    def weights(self):
        return self.updates[-1]


    def make_step(self, signals, dt, rng):
        delta = signals[self.delta]
        pre_activations = signals[self.activations]
        global_grad = signals[self.global_grad]
        weights = signals[self.weights]
        eligibility_trace = signals[self.eligibity_trace]
        td_error = signals[self.td_error]

        rule = self.rule

        learning_rate = rule.learning_rate


        def step_epes():
            # Expand the pre_activations to correspond to weight matrix
            tiled_activations = np.tile(pre_activations, weights.shape)
            tiled_activations = np.transpose(tiled_activations)

            char_eligibility = tiled_activations * global_grad

            # Updating the eligibility trace
            eligibility_trace[...] = np.e**(-dt/rule.tau_e) * eligibility_trace + char_eligibility


            # The weight change
            delta[...] = learning_rate * td_error * eligibility_trace
            weights[ ... ] = delta + weights

        return step_epes

The builder looks like this:

@Builder.register(EPES)
def build_epes(model, epes, rule):

    conn = rule.connection

    # Create input error signal
    error = Signal(np.zeros(2), name="EPES:error")
    model.add_op(Reset(error))


    weights = model.sig[conn]['weights']

    # The eligibility trace has the shape of the weights
    eligibility_trace = Signal(np.zeros(weights.shape), name='EPES:eligibility_trace')



    model.sig[rule]['in'] = error  # error connection will attach here
    model.sig[rule]['eligibility_trace'] = eligibility_trace


    # The TD error
    td_error = error[0]

    # The characteristic eligibility
    global_grad = error[1]

    # Presynaptic activations
    acts = model.build(Lowpass(epes.pre_tau), model.sig[conn.pre_obj]['out'])

    # The delta change in the synaptic weights
    delta = Signal(np.zeros(weights.shape), name="EPES:delta")

    model.add_op(EPESSim(rule=epes,
                         activations=acts,
                         global_grad=global_grad,
                         eligibility_trace=eligibility_trace,
                         td_error=td_error,
                         delta=delta,
                         weights=weights
                         ))




    model.sig[rule]['activations'] = acts
    model.sig[rule]['delta'] = delta
    model.sig[rule]['eligibility_trace'] = eligibility_trace
    model.sig[rule]['td_error'] = td_error

Corrections/suggestions are welcome. So basically like I said, the idea is to have a global td error for the reward and local eligibility traces. Global gradient is the gradient of the policies log-likelihood. The rule receives 2 incoming connections (td_error and global_gradient) Thank you for the feedback in advance.

Edit: I guess I can use Lowpass instead of using the exponential in the operator to update the eligibility trace, the filtered activities are not per-say eligibility trace because they don’t keep any information about the gradient.

The code seems to make sense, but I didn’t test it and I might have overlooked things (and it’s been a few years that I did anything with reinforcement learning). Right now your TD error and the global gradient are one-dimensional. I assume that is what you want (as opposed to n-dimensional TD error and gradient)?

Yes, the TD error should be one-dimensional. Although the global gradient need not be, but for my case I think it is enough for the global gradient to be one-dimensional. Ideally a multi-dim global gradient could be multiplied with a random matrix which is postsynaptic neuron specific, i.e. one random matrix of dims (dim_gradient, 1), but I am not sure how to implement that yet… I guess the connection can hold a constant signal (lets name it random_matrix) of dimensions (dim_gradient, n_postneurons)? then the eligibility trace becomes Lowpass( pre_activations X global_gradient X random_matrix ), where X is matrix multiplication.

Another thing that would be useful is a voltage based derivative approximate of the postsynaptic neuron transfer function. This is a bit unclear to me with nengo, since the voltages are between 0 and 1 if I understand it correctly, one could do: Lowpass( pre_activations X (global_gradient X random_matrix X post_voltage)) for the eligibility trace in the end?

Just to clarify, I am looking at this from an ML perspective, but that is probably clear already.

I can think of two places to store such a random matrix: Either you do it in the transform on the connection to the error input or you do it on the learning rule object itself.

I can’t answer your other question at the moment. I would have to familiarize myself with the math first. But you are correct that the voltages in Nengo (for LIF neurons) are normalized to the range [0, 1] and they will spike (and be reset) when they reach a voltage of 1.