Hierarchy reinforcement learning vs NengoDL

I’m new to Nengo and the neuromorphic field. I want to implement a model that controls a robot with reinforcement learning in Nengo.

I’ve read a lot of stuff before I go, but I’m confused… I saw the hierarchy reinforcement learning (HRL) implementation that built the entire model of Q-Learning directly with spiking neurons.

On the other hand, I saw that you have the Nengo DL that can “convert” regular non-spiking Keras models into spiking ones.

I don’t know which way to go and what are the benefits of every choice. The easiest thing for people with experience in Keras is probably the Nengo DL. But if it was so easy to convert non-spiking models to spiking ones, no additional effort was needed for a thesis like the HRL…

Any suggestion would be highly appreciated.

Hello @nrofis, Welcome to our community! I am not at all versed with Reinforcement Learning (RL), but do have some experience with NengoDL and I hope… with the following explanation of NengoDL you might be able to identify how apt is NengoDL for your RL task.

NengoDL is built over TF with a support for Nengo based spiking networks, such that it facilitates easy training/conversion to Spiking Networks, but with certain limitations. It cannot convert “any” TF/Keras model to a completely Spiking model, rather TF models with a fixed set of TF attributes/layers. What I mean is, a TF model built of layers mentioned here can be converted into entirely spiking network, but layers which aren’t mentioned here are executed as TensorNodes i.e. original non-spiking TF functions.

As I am not experienced with RL, I don’t know if you would find any RL relevant programming construct in the link I mentioned. If you don’t find one there, then RL models built with those programming constructs cannot be converted to spiking by the usage of NengoDL. Hence, you need to separately implement the spiking equivalents of those RL models, thus the need of the thesis I believe. For e.g. the Q-value function, or the policy gradient methods have basic functions which could be implemented using NEF, thus with spiking neurons.

Other experts are welcome to correct me if I am wrong or add to it!

Thank you for your reply, @zerone . I’m aware that it cannot convert any model. Here are more details about what I’m trying to do.

I saw a paper that controls a robot with “classical” non-spiking neural networks. The paper uses four simple feed-forward DNNs. Each one has input and output layers based on the robot specs and two fully connected hidden layers with a ReLU activation function.

I not sure how to covert those “simple” networks into spiking one (I don’t have the source code, by conversion I mean building them directly with Nengo or using Keras and Nengo DL). Although they are simple, they are deep ones, and I’m not sure how Ensembles and PES can replace such a deep network.
In all the examples I saw in Nengo, the networks were pretty simple, even most of them were without any learning. It seems like Nengo is used to build a “bio-inspired programming language” (functions, memory, etc.). PES was in few examples to learn some connections, but it used for so simple examples that are not close to real-life problems. I didn’t saw any example that can replace a deep network one.

Also here with the HRL, it required time to implement that network, but I’m not sure if it can fully replace such a simple deep network that also used for Q-Learning.

So, this is my dilemma.

Hello @nrofis, NengoDL works in the following way. If you network has Dense/Conv/AveragePooling layers, you can convert it to entirely spiking one, no matter how deep it is; although, based on the depth of the network, you might need to scale the firing rates using the scale_firing_rates parameter in nengo_dl.Converter(), might need to lower down the synapse value, and present your inputs for a longer duration. Following is a short snippet how it should look like:

# convert the keras model to a nengo network
    nengo_converter = nengo_dl.Converter(
        model,
        swap_activations={tf.nn.relu: activation},
        scale_firing_rates=scale_firing_rates,
        synapse=synapse,
    )

where model is your TF trained model, and activation is your spiking neuron model (e.g. nengo.SpikingRectifiedLinear()). So you need to first train your TF network with ReLU neurons (as is classically done) and then feed the trained network to the nengo_dl.Converter() to get a spiking equivalent. Here’s an example; note that the example is not constrained to just simple 2D CNNs, but can be applied to any network (as mentioned in the start of this comment).

If this doesn’t help, then let’s wait for what other experts have to say about it.

@zerone Thanks for the explanation. I’ve already read that. My question is should I use Nengo DL or implement HRL directly. What are the benefits and cons of each method? (Assuming that the model can be converted via Nengo DL)

Spiking models are a new area for me, so I can’t answer this question.

Hello @nrofis, as you have already read and are aware of the methods to build a spiking network, it should be apparent that the most common method is to first build a traditional network (i.e. implement it directly with TF), then train it using the TF’s model.fit(), and then use NengoDL to convert it.

If the examples you have seen uses NengoDL to build the network (and not TF) to be trained later, then I guess… you might be in a bit of fix here to use which framework (i.e. use NengoDL or TF?). Actually, even if you use NengoDL to build the network and train it, behind the scenes it is trained with TF API and non-spiking neuron. E.g. if you use nengo.LIF() neuron to build your NengoDL network, then while training it, NengoDL replaces nengo.LIF() with nengo.LIFRate() neuron and uses TF API to train the same. One benefit of using NengoDL to build and train the network is that you can incorporate Nengo specific features; as mentioned above, use nengo.LIF() to train your network, such that later conversion accounts for the LIF spiking neuron in the spiking network. There might other benefits as well… e.g. using the scale_firing_rates parameter while training which further tunes your model better for spiking functioning. NengoDL just adds a Nengo specific wrapper and training is still done with TF (behind the scenes) at the end of the day.

My suggestion would be… to be first specific with the usage of TF to build and train the network and then use NengoDL to appropriately convert it. Later you can add bells and whistles to further tune your network by using NengoDL to build and train. All the above explanation is assuming that your HRL problem has the training data available for offline training.

That’s exactly the point. If I need to implement a simple network of Q-Learning. Should I do it with Nengo DL or implement directly the function in Nengo (as explained in HRL).

I believe that Nengo DL is “easier” for those who familiar with Keras, but what is the benefit of implementing the HRL work? Do they do the exact same thing by implementing Q-Learning functionality?

All the above explanation is assuming that your HRL problem has the training data available for offline training.

Not really, I have a simulation of online environment, but I record all the data for offline learning…

Hello @nrofis, based on your above comment, I don’t think I will be able to put forward useful suggestions as I ain’t aware of HRL and the possible benefits of implementing it in Nengo from scratch. Let’s wait what experts have to say about it.

1 Like