Yeah, the example you linked is somewhat complex, and does strange things. But those things are explainable.
- About the agent constantly rotating: When the agent is constantly rotating it is actually in an ideal state. This is only because the agent is not penalized for constantly turning, it has no “incentive” to do anything else. To fix this issue, you’d need to penalize (very slightly) both the
right turn actions when the agent is not moving forward and there’s nothing detected on the forward radar.
- About the agent constantly stuck on a wall: This is also a consequence of the simplistic error rules used for this agent. If you look at the error signal being generated, you’ll notice that the error signal is always being generated, so, when the agent is moving forward, it is learning that moving forward is really good, and it forgets that bumping into walls is bad. So, by the time it reaches a wall, it has already forgotten that crashing into it is bad, and so it does so. You can address this by decreasing the map size or increasing the agent move speed (so the agent encounters a wall before it has completely forgotten it is bad to crash into one). You can also reduce the forgetting by decreasing the learning rate on the “go forward” action (I used
conn_fwd and it seemed to perform a bit better):
conn_fwd = nengo.Connection(radar, bg.input, function=u_fwd, learning_rule_type=nengo.PES(learning_rate=5e-5))
I should note that this example is very simplistic, and the agent does do the wrong thing most of the time. But, as the simulation progresses (like after 30-40s of the simulation time), you will start to see slight hints that the agent is learning to avoid walls (remember that spinning in a circle is also the best way to avoid all walls).
There are ways to improve the agent’s performance, and it’s mostly by altering error signals being generated. There is a modified version of the code in the NengoFPGA examples where I updated the error signals to make the agent slightly “smarter”. Note that the NengoFPGA example uses a special
FpgaPesEnsembleNetwork, but that’s essentially a
nengo.PES learning rule combined into one network. The relevant modifications to the original example are:
# Generate the training (error) signal
def error_func(t, x):
actions = np.array(x[:3])
utils = np.array(x[3:6])
r = x
activate = x
max_action = max(actions)
actions[actions < action_threshold] = 0
actions[actions != max_action] = 0
actions[actions == max_action] = 1
return activate * (
np.multiply(actions, (utils - r) * (1 - r) ** 5)
+ np.multiply((1 - actions), (utils - 1) * (1 - r) ** 5)
errors = nengo.Node(error_func, size_in=8, size_out=3)
# Connect output of error computation to learning rules
Unfortunately, learning to do reinforcement learning (in general) is a slow process that requires a lot of trial and error. For learning how to do it in Nengo, you’ll want to first familiarize yourself with exactly how the
nengo.PES learning rule works, since this is the most common on-line learning rule used in Nengo. I’d advise you to create a simple network, and play around with it. Observe what happens when you change the inputs in a simple learned communication channel network and think about how that would affect something like the critter model.
Also, since you are playing around with the critter model, you’ll want to familiarize yourself with how the basal ganglia network works, since it forms the basis of how the critter makes “decisions”.
Finally, if you search the CNRG (computational neuroscience research group – these are the folks that work with the NEF and Nengo for research) publications page for “reinforcement”, there are several theses and papers exploring how to use Nengo to do reinforcement learning.