Nengo Summer School RL example

xchoo · July 14, 2021, 10:26pm

The agent’s learning rule is very simplistic, so it’s behaviour can be a bit strange. What is it learning to do is to avoid hitting the walls, so it’s natural to spin in a circle, or just go back and forth between two relatively open spots. I discuss some of that behaviour in this forum post as well.

This is a side-effect of how the agent is coded. If you probe the output of the BG, you’ll see that it is quite noisy. Since the output of the BG inhibits the errors network, the noisiness of the BG output tends to keep the errors signals inhibited quite often (unless it really gets stuck). Thus, the error seems to hover around 0.

You can try to modify the agent to use the “cleaner” signal from the thalamus output by commenting out this code:

    nengo.Connection(bg.output[0], errors.ensembles[0].neurons, transform=np.ones((50, 1)) * 4)
    nengo.Connection(bg.output[1], errors.ensembles[1].neurons, transform=np.ones((50, 1)) * 4)
    nengo.Connection(bg.output[2], errors.ensembles[2].neurons, transform=np.ones((50, 1)) * 4)

and replacing it with this code:

    # Action 0 inhibits errors for actions 1 and 2
    nengo.Connection(thal.output[0], errors.ensembles[1].neurons, transform=np.ones((50, 1)) * -4)
    nengo.Connection(thal.output[0], errors.ensembles[2].neurons, transform=np.ones((50, 1)) * -4)
    # Action 1 inhibits errors for actions 0 and 2
    nengo.Connection(thal.output[1], errors.ensembles[0].neurons, transform=np.ones((50, 1)) * -4)
    nengo.Connection(thal.output[1], errors.ensembles[2].neurons, transform=np.ones((50, 1)) * -4)
    # Action 2 inhibits errors for actions 0 and 1
    nengo.Connection(thal.output[2], errors.ensembles[0].neurons, transform=np.ones((50, 1)) * -4)
    nengo.Connection(thal.output[2], errors.ensembles[1].neurons, transform=np.ones((50, 1)) * -4)

An explanation of the code change:

The output of the Thalamus network is cleaner than the output of the BG network. From the Thalamus network only the “chosen” action is active, whereas the output of the BG network can be quite noisy.
The output of the BG network is a negative value, so to inhibit the errors ensembles, the transform is positive (i.e., transform=np.ones((50, 1)) * 4). Conversely, the output of the Thalamus network is a positive value, so to inhibit the errors ensembles, the transform is negative (i.e., transform=np.ones((50, 1)) * -4)
The desired inhibition is (as stated by the comment in the code):

inhibit learning for actions not currently chosen (recall BG is high for non-chosen actions)

For the output of the BG, in the ideal scenario, the output of the chosen action is 0, and the non-chosen actions are non-zero negative numbers (e.g, -0.8). Thus, to inhibit the non-chosen actions, all you need to do is to feed the BG output to the respective (same action) error ensemble. The output of the Thalamus, is however, inverted. The output of the chosen action is 1, and the non-chosen actions are 0. So, to inhibit the non-chosen actions, you need to connect the output of the chosen action to the non-chosen actions (i.e., connection action 0 to errors 1 and 2)

If you run the code, the agent does sometimes do a little more moving about, but, it can also quite quickly get stuck into a pattern of turning in a circle. As I mentioned in my other post, this is because there is nothing that biases the forward movement as favoured. Rather, turning (left or right) and not hitting anything is as good as going forward and not hitting anything.

nrofis:

I don’t understand these lines:

nengo.Connection(bg.output[0], errors.ensembles[0].neurons, transform=np.ones((50,1))*4)    
nengo.Connection(bg.output[1], errors.ensembles[1].neurons, transform=np.ones((50,1))*4)    
nengo.Connection(bg.output[2], errors.ensembles[2].neurons, transform=np.ones((50,1))*4)

Aren’t they equivalent to nengo.Connection(bg.output, errors.input, transform=4)

No, they are not. If you look at this connection statement:

nengo.Connection(bg.output[0], errors.ensembles[0].neurons, transform=np.ones((50,1))*4)

You’ll notice that the connection is made to a .neurons object. This means that the output of the BG is being connected directly to the neurons of errors.ensemble[0]. This is how we implement neural inhibition within Nengo. The direct connection to the neurons of the errors.ensemble ensemble is also the reason why the transform parameter is a matrix rather than a single scalar value.

If you go back to lectures 2 & 3, recall that there are two different ways of connection to Nengo ensembles. This first method connects directly to the neurons of an ensemble:

nengo.Connection(input, ens.neurons)

This second method connects to the ensemble, but the connection is made through the encoders of the ensemble. This second method is the “NEF” method for connecting to an ensemble (as opposed to the “direct” method for connecting to an ensemble – which bypasses the ensemble encoders):

nengo.Connection(input, ens)

Just to summarize, this code:

nengo.Connection(bg.output, errors.input, transform=4)

would be equivalent to the NEF method for connecting to the ensembles. I.e.,

nengo.Connection(bg.output[0], errors.ensembles[0], transform=4)    
nengo.Connection(bg.output[1], errors.ensembles[1], transform=4)    
nengo.Connection(bg.output[2], errors.ensembles[2], transform=4)