Solving CartPole directly with the NEF

I’m not sure how relevant this post is to Nengo, but since there have been some recent discussions on using Nengo to solve OpenAI Gym environments like CartPole, I thought I’d add mention some related results I just obtained in a novel way.

In his excellent technical summary of the Neural Engineering Framework (NEF), Terry says: the math provided in this article is sufficient for implementing the Neural Engineering Framework on your own. So that’s why I did: I wrote a bit of numpy code to build an encoder with fixed parameters and weights, and a decoder with learnable weights. Super simple, just 24 lines of code (plus comments)! Then, inspired by results suggesting that genetic algorithms can out-perform Deep Reinforcement Learning, I used a simple genetic algorithm with mutation (no crossover) and elitism to see whether the network could learn a decoder to solve CartPole. And it worked!

The encoder takes the four-input vector of observations (cart position, cart velocity, pole angle, pole angular velocity) and maps them to a small number of LIF neurons (10 was sufficient for the GA to solve the problem quickly). The decoder (learnable part) maps the neurons’ activations to a one-dimensional vector squashed with the standard tanh function (-1=go left; +1=go right). By giving each decoder solution several different initial CartPole configurations to evaluate, and running the episodes in parallel using the Python multiprocessing library, I’m able to get solutions in under a minute on an old eight-core desktop computer.

I made a little github repository so you can try it out yourself.

Next I’d like to see whether I can get this approach (NEF + GA) working with a continuous-control environment like Pendulum. Ultimately, my goal is to make a simple C++ implementation of the NEF that I can run on a microcontroller, using input from a Mini eDVS as input to an NEF network to navigate a micro quad-copter.

5 Likes

Awesome! Great idea, and nice to see that work so well!

I’m very curious what will happen as you get into real-wold domains… I love GAs as an optimization process, and I’m fascinated about situations where it is possible to use them when the evaluation environment is the real world (e.g. things like the classic Evolutionary Strategy process applied to optimizing the shape of curves in pipes).

Terry

Thanks, Terry! I’m happy to report that using same approach (with 20 neurons instead of 10) I’m now getting results with the continuous-control Pendulum-v0 problem that are competitive with the leaderboard. And running on a 32-core machine (on AWS) it converges incredibly fast, in under 20 minutes. The ability to evolve small NEF networks for continuous control really opens up some possibilities for neuro-robotics. It would be trivial to translate the trained network to C++ to run on a microcontroller.

3 Likes