Hierarchical reinforcement Learning in Nengo

sizeras · December 4, 2020, 3:06pm

Hello to everyone,

I would like to try implementing this thesis about hierarchical reinforcement learning (HRL). I have set almost everything up, but i have a problem with the ensemble representing the Q values. I am working on a 2D space environment, hence the State Memory is a 2D ensemble representing x and y axis. I have 4 basic actions, so the Qvalue ensemble’s output must be a 4D vector. I would really apreciate it, if anyone could help me how to connect these two ensembles in order to apply the PES rule for the TD error, etc.

Thanks in advance!

drasmuss · December 4, 2020, 9:09pm

When connecting two ensembles with different dimensionality you need to define what function you would like to use to connect them (since the default mapping, an identity transform, doesn’t work if the dimensions don’t match). I’d recommend using a function, as it lets you control what the initial action values are. E.g., if we wanted to initialize them all to zero, you could do something like this

import nengo

with nengo.Network() as net:
    a = nengo.Ensemble(100, 2)
    b = nengo.Ensemble(100, 4)
    nengo.Connection(
        a, b, learning_rule_type=nengo.PES(0.1), function=lambda x: [0, 0, 0, 0]
    )

sizeras · December 22, 2020, 4:28pm

Hello again, thank you so much for your previous response. By this time, i was running the model through nengo-gui and i was adjusting the values using the Node sliders. Now, i created a python script representing my 2D grid using pygame. I should use some kind of process communication (like pipes) between my Agent (running inside anaconda notebook perhaps) and my Environment in order to establish their interaction? Sorry for the naive question, i am new to this

xchoo · December 23, 2020, 2:17am

Hi @sizeras,

Nengo’s flexibility with Python means that you can get your Nengo model to interact with a grid-type environment in a number of ways. I’ll elaborate on some of these options below!

Using Python from Command Line
Perhaps the simplest way to integrate your pygame environment is to put the pygame execution code into a Nengo.node function within a single script. Here’s an example script with a simple pygame window that I’ve created to test this method: test_pygame.py (2.6 KB)

To run the Nengo model with the pygame environment, all you need to do is to call the python script from the command line, like so:

python test_pygame.py

With this method, what you basically do is to put the Pygame processing function into a Nengo node. Then you can connect whatever data you want to display on the Pygame screen to that Nengo node. Not that unlike code run in the NengoGUI, you need to create a nengo.Simulator object to run the Nengo model (NengoGUI does this for you in the background, which is why it is not needed in GUI code).

Using NengoGUI and GridWorld
If you want to integrate your pygame environment with NengoGUI, it is unfortunately, a little more complicated. Putting the pygame execution function in a Nengo node doesn’t work with NengoGUI because the server instance doesn’t properly start the pygame window.

One way to get a grid world in NengoGUI is to follow this example. While the example is for NengoFPGA, it is very straight forward to replace the FPGA-specific code with just a regular Nengo model. The grid world itself is implemented in this file, and it provides a World object (that determines the grid), and an Agent equipped with sensors and movements.

This grid world code can be used within NengoGUI because it has been written to render the environment to an SVG image. NengoGUI can then take this SVG image and render that to the screen. This SVG is updated every simulation timestep, which is what creates a moving image.

Using NengoGUI and Pygame
Integrating NengoGUI with your own custom pygame environment is more complex still. To use this approach, I’d recommend creating a separate Python script to run your pygame environment, and then using Python sockets to send information to and from this environment.

You can find an example of a Python socket function to put in a Nengo node in the NengoFPGA codebase. In NengoFPGA, the socket communication is used to communicate between another physical device (an FPGA board). To use this socket function, you’ll need to make a Nengo node with that function as the node’s output, and then have equivalent socket code in your pygame script.

sizeras · December 23, 2020, 12:14pm

Thank you so much for this extensive answer! Last night i used multithreading and i managed to integrate my pygame environment inside NengoGUI.

I am currently trying to deal with Node nengo objects etc, in order to establish the connection between the Agent and the Environment.

I’ll let you know!

sizeras · January 3, 2021, 4:22pm

I think i am close enough to finish it, but i am facing this problem. Given the topology of the network in the first picture, as you can see from the second one, while my Qcurrent_representation outputs the vector
[-0.37 , 0.05 , 0.07, -0.03], the thalamus argmax value is on the first dimension which is wrong.

Any thoughts on this?
Thank you

xchoo · January 3, 2021, 10:53pm

I put together a quick model with just the BG and thalamus networks here: test_bg.py (759 Bytes)
and in my attempt to replicate the input values you are providing your BG network, it seems like the issue is that the inputs you are providing the BG might be too low for the BG-thalamus network to correctly pick out the “winner”. Part of the thalamus network is a recurrent mutual inhibitory connection.

For inputs that the BG can distinguish, the purpose of this connection is to pick out the one “winner”. But, this connection does have the side effect of choosing a random “winner” if none of the BG inputs are strong enough to be determined the winner. You should see this effect by re-running the network multiple times, and each time, you should observe the chosen action to be random. Note that inputs below 0 are treated as equivalent to 0 for the BG network.

To improve the behaviour of the BG network, you can try boosting the input to the BG such that it is always between 0.5 to 1.5. You can do so by providing a non-zero value to input_bias parameter. Refer to the code I’ve attached to see how I’ve done it for my example network.

nrofis · June 4, 2021, 10:19pm

@sizeras do you have a code you can share?