Reinforcement learning in Nengo for continuous action space and continuous state space

Good question! This is actually something that I just secured a grant for: looking into doing reinforcement learning in continuous action spaces (and continuous state space, and in continuous time). So I’d say this is a topic that I’m looking forward to digging into in more detail of the next year or so, but right now we don’t have any strong examples.

And I also agree with your thoughts on the BG/Thal system, which really is set up for discrete action spaces. That said, it is worth noting that the BG itself does provide a much more continuous output – the current Thalamus model does a little bit of extra winner-take-all on top of what the BG does in order to make the output very discrete. So that could be relaxed a bit by adjusting the Thalamus model. But, that then runs into the question of how to rethink the learning itself, which as I said is a bit of a research topic itself.

For the grant that I mentioned that I’ll be working on, the particular emphasis will be on modelling how biological systems do this sort of reinforcement learning, so that’s exactly why I wasn’t planning on doing something like just implementing DDPG in NengoDL – I think you’re exactly right that the algorithm performance would be worse (although the energy consumption, given the right hardware, might be much improved!) and it wouldn’t really change the biological plausibility. But I do think that some of the ideas in DDPG are pretty relevant. And I’d also say there’s some interesting similarities that I’m still trying to sort out to things in the adaptive control literature. But I’m still trying to sort out exactly what that research path will be.

Could you give a bit more detail about what sort of project you’re thinking of? Or tasks? It sounds like you’re more interested in the biological plausibility aspect, but I’d like to get a better sense of what direction you’re coming at this from. It would be good to share ideas, in any case!

1 Like