I was wondering if there are any RL algorithms that were designed for use with spiking neurons/Nengo, and designed for a continuous action space and a continuous state space?
Essentially, I’m looking for algorithms analogous to DDPG or TD3, but were designed specifically for Nengo. I don’t want to just implement DDPG with spiking neurons in NengoDL, as I suspect that the algorithm performance would be worse and because the biological plausibility does not really change.
I have gone through this paper: http://compneuro.uwaterloo.ca/files/publications/rasmussen.2017.pdf
But from my understanding, NHRL was designed for a continuous state space but NOT a continuous action space. The approach I could think of was to discretize my action space. but I’m worried about the very high dimensionality and performance issues with the agent actually being able to learn anything.
Can the Basal Ganglia/Thalamus units be used for continuous action selection? As far as I can tell, they simply argmax over the provided inputs.