Reinforcement learning in Nengo for continuous action space and continuous state space

I was wondering if there are any RL algorithms that were designed for use with spiking neurons/Nengo, and designed for a continuous action space and a continuous state space?

Essentially, I’m looking for algorithms analogous to DDPG or TD3, but were designed specifically for Nengo. I don’t want to just implement DDPG with spiking neurons in NengoDL, as I suspect that the algorithm performance would be worse and because the biological plausibility does not really change.

I have gone through this paper:

But from my understanding, NHRL was designed for a continuous state space but NOT a continuous action space. The approach I could think of was to discretize my action space. but I’m worried about the very high dimensionality and performance issues with the agent actually being able to learn anything.

Can the Basal Ganglia/Thalamus units be used for continuous action selection? As far as I can tell, they simply argmax over the provided inputs.

Good question! This is actually something that I just secured a grant for: looking into doing reinforcement learning in continuous action spaces (and continuous state space, and in continuous time). So I’d say this is a topic that I’m looking forward to digging into in more detail of the next year or so, but right now we don’t have any strong examples.

And I also agree with your thoughts on the BG/Thal system, which really is set up for discrete action spaces. That said, it is worth noting that the BG itself does provide a much more continuous output – the current Thalamus model does a little bit of extra winner-take-all on top of what the BG does in order to make the output very discrete. So that could be relaxed a bit by adjusting the Thalamus model. But, that then runs into the question of how to rethink the learning itself, which as I said is a bit of a research topic itself.

For the grant that I mentioned that I’ll be working on, the particular emphasis will be on modelling how biological systems do this sort of reinforcement learning, so that’s exactly why I wasn’t planning on doing something like just implementing DDPG in NengoDL – I think you’re exactly right that the algorithm performance would be worse (although the energy consumption, given the right hardware, might be much improved!) and it wouldn’t really change the biological plausibility. But I do think that some of the ideas in DDPG are pretty relevant. And I’d also say there’s some interesting similarities that I’m still trying to sort out to things in the adaptive control literature. But I’m still trying to sort out exactly what that research path will be.

Could you give a bit more detail about what sort of project you’re thinking of? Or tasks? It sounds like you’re more interested in the biological plausibility aspect, but I’d like to get a better sense of what direction you’re coming at this from. It would be good to share ideas, in any case!

1 Like

Thank you for the explanations!

As of right now I’m just investigating different reinforcement learning algorithms/strategies using spiking neural networks and comparing them to some more traditional algorithms. It makes sense that this is a research question as of itself, at least for continuous action spaces. I was hoping to ultimately apply NHRL or some Nengo reinforcement learning technique for robotic control in a continuous state/action/time space. Before that, I’m trying to benchmark those algorithms on OpenAi gym environments.

Terry, congratulations on the grant! This topic – using a spiking neural net simulator like Nengo for continuous control – is also a current research interest of mine. So it’s good to see that at least one other person (kshivvy) is interested as well. I’ve had some very promising results using Proximal Policy Optimization for landing a 2D quadcopter, and am currently working on extending my OpenAI Gym environment for 3D landing, and eventually predator/prey modeling. I also have an undergraduate student doing an honors thesis with me this coming year on the topic, after having some exposure to Nengo in my neural nets course.

If there’s any way I can be of use with the grant-funded project, I’ll be delighted to help out.