Nengo DL with Reinforcement Learning

I’ve been looking for a while for a deep SNN (using Nengo DL) for reinforcement learning but didn’t find any.

A lot of Tensorflow/Keras DNN reinforcement learning models use features that Nengo DL doesn’t support, such as GradientTape.

Is there any example of Nengo DL for reinforcement learning models? Especially for the Soft Actor-Critic model?

Hi @nrofis,

In regards to your questions:

You are correct in the observation that we do not have any examples of reinforcement learning implemented in NengoDL. For reinforcement learning, we tend to use just the regular Nengo to construct our models.

This is correct. NengoDL was designed to run TensorFlow (or Keras) models in the Nengo ecosystem, and the simulation environment of Nengo differs greatly from TensorFlow. In Nengo, the neural network simulations (note: simulation differs from training) are performed using discrete timesteps and neuron currents are simulated.

As mentioned above, we don’t have any examples of reinforcement learning in NengoDL specifically. However, we do have some examples of reinforcement learning models constructed for Nengo itself.

If you have a TensorFlow/Keras RL neural network that you’d want to simulate in NengoDL, it is possible to do so, but it will require some work.

To do RL in Nengo, you’ll first need a Nengo compatible neural network. The first approach you can take is to re-implement the TF/Keras model in native Nengo code. The NengoDL documentation provides a good example / comparison of the two model languages with each other, and you can use that as a starting point on how to re-write your TF model as a Nengo model. Alternatively, you can also use the NengoDL converter to convert your TF model to a Nengo network.

Once you have a Nengo network, you’ll then need to figure out what kind of learning rule you’ll want to use to perform the learning. Typically, we use the PES learning rule, and that seems to suit most of our needs. You can, however, also implement your own learning rule, and attempt to replicate the functionality of TF’s GradientTape in this way (i.e., write a learning rule that computes the gradients of the inputs and uses it to compute some sort of error signal?).

Thank you for your detailed answer. This is exactly what I was worried about :). I have a paper on a deep reinforcement learning model for some robotic project and I wanted to try to convert that paper to SNN (with Nengo).

In that paper, they use SAC (Soft Actor-Critic) network which is a relatively complex model for conversion, which is implemented as DNN. I even already tried to convert a much simpler model here: Converting a simple ANN to SNN - General Discussion - Nengo forum without success.

I know PES, but I don’t think it is much powerful as a deep model. PES can only learn one connection while deep learning learns all the connections in the network every step. So I believe that if I compare the result of the deep SAC model and models with PES. The deep SAC will perform much better. (Maybe I wrong, but that is my feeling). Here is another question that I’ve asked about that dilemma: Hierarchy reinforcement learning vs NengoDL - General Discussion - Nengo forum

So here are my questions:
Do you think that PES can perform as well as DNN?
Can PES replace any RL model out there (DQN, DDQN, SAC, etc.)?
Is there any example of how to implement “big” RL models (DQN, DDQN, SAC, etc.) directly in Nengo (if that thing is necessary…)

I think you are confusing two different concepts with respect to DNN and the PES. A “deep neural network” describes the network architecture (typically, a DNN is a network that consists of many stacked layers of neurons), while the PES is a learning rule (which defines how weights are changed in a network). While it is true that the PES works on one connection, you can implement a network where the error signal is projected to all of the connections in a network, so that all of them are trained at the same time. How to implement such a model to perform the task that you want is a research question, and is outside the area of my speciality.

I can’t answer these with any certainty as they are open research questions (i.e., how would you implement these DNNs in a spiking neural network with PES connections). I would recommend checking out @drasmuss’s 2014 PhD thesis where he describes the PES implementations of several RL architectures. @drasmuss also has several other publications that may interest you.

With respect to motor control, @travis.dewolf’s 2014 PhD thesis or other publications may also interest you.

With respect to deep neural networks, @Eric’s 2018 PhD thesis may be of interest. His work focuses primarily on the visual areas of the brain, but the techniques he develops are applicable to other brain areas as well. Here’s a poster where he demonstrates feedback alignment being implemented in a spiking DNN. His other publications are available here.

Other thoughts
In my experience, designing models / networks in Nengo requires a shift in thinking on how the networks are constructed. One of the original design philosophies with Nengo is to keep things as biologically plausible as much as possible. This means that most learning rules in Nengo (the PES especially) are “local” in the way they obtain their information (the PES learning rule only has access to information locally available – in the connection – rather than globally available in the whole network). This does make it challenging to convert other neural network architectures to Nengo, but it is doable. Often what is required is to break down the source neural network into smaller components which does some of the computations (e.g., derivatives, or gradients) that TF generates “for free”.

Additionally, I’d like to stress that while the components available in the vanilla installation of Nengo are designed with biological plausibility in mind, Nengo (the simulation platform) itself is general enough to allow you to implement and use your own custom learning rules (e.g., one that computes the gradients and transmits it to all connections in a network).

Addendum:
If you want to convert your DNN into a spiking network you can run natively in TF/Keras (no Nengo required), I recommend checking out our KerasSpiking package. We haven’t tried it with the GradientTape, but it may just work. :smiley:

Thank you for your detailed answer, @xchoo .

designing models / networks in Nengo requires a shift in thinking on how the networks are constructed

I think this is absolutely the key. I’m coming from the DNN world so intuitively learning few connections at a time with PES, feels very weak compared to networks that learn 500K+ parameters… For example, deep learning models can learn complex features in the data and have good accuracy and loss metrics. And I don’t know if models with PES will be good competitors against large deep learning models. But maybe they shouldn’t :slight_smile:

I think that I will try to use Nengo with PES and see how it goes, and then moving to more advanced implementation and try the KerasSpiking. Actually, I’ve seen the KerasSpiking (briefly) and didn’t find a way to use GradientTape, but maybe it requires more investigation.

Thank you very much!

I would be careful on how you define “parameters” here. With respect to neural network training, parameters often equates (or is proportional to) the number of connection weights being tuned in the network. If you use that metric, the PES learning rule can still learn hundreds of thousands of parameters, since it is still changing the all of connection weights between the neural layers (which it is applied to). The difference of the PES learning rule is that it does so in a systematic way, by working in the “decoded” space of the representation, rather than in the neuron activity space, which most other training methods use.

I would say that there’s definitely “the right tool for the job”. It may very well be the case that the PES is not suitable for some applications. In fact, if you look at @Eric’s work, he uses traditional machine learning / deep learning techniques to train the vision systems he is working with. But, on the other hand, @travis.dewolf has managed to use the PES successfully to model a motor system that is able to learn not only the dynamics of the plant (arm) it is controlling, but also the the dynamics of the environment the plant is operating in.

The PES learning rule in particular works in the “decoded” space of the neural ensembles, so if your problem can be defined in this space, the PES learning rule will probably work well for you. Going back to the motor control example, the dynamics of arm and environment can be defined as a bunch of equations, and all the PES rule is doing is learning the parameters of those equations that optimize control of the arm. Compare that to the vision networks, where there isn’t really a “standard” equation you can use to define the outputs of each layer. Rather, through the learning process each layer sort of “self-organizes” into feature extractors and the like. In this case, the PES learning wouldn’t work as well, because there is no well-defined “function” to learn.

Details like this makes using the PES (or other learning rules) an open research question with respect to what works best for the specific problem you are trying to solve. :smiley:

Thank you again @xchoo , a lot of stuff to consider. I think I need to change my mindset when thinking about spiking networks.