Connecting NHRL with OpenAI's Gym

I know this is a long post, so apologies in advance. You can find the specific questions and a tl;dr at the bottom.

I was trying to run this NHRL implementation to test the algorithm on some different environments. I have seen this post already, but it doesn’t look like he is using NHRL or anything beyond normal Q estimation.

It was extremely difficult, but I was able to successfully run the given gridworld example. I had to manually add the library for Nengo 1.4, as well as modify a file that was calculating psuedoinverses, reinstall Java, etc. However, I couldn’t visualize anything since it looks like I needed some .bak files to do so.

Now I’m looking to use this implementation to run on an OpenAI gym environment, specifically the cartpole problem.

I modified the file to include this function:

def run_cartpole(args, seed=None):
if seed is not None:
seed = HRLutils.SEED

net = nef.Network("run_cartpole")

stateN = 400
stateD = 4
actions = [("left", [0]), ("right", [1])]

agent = smdpagent.SMDPAgent(stateN, stateD, actions, stateradius = 50, **args)


env = cartpoleenvironment.CartPoleEnvironment(stateD, actions)

net.connect(env.getOrigin("state"), agent.getTermination("state_input"))
net.connect(env.getOrigin("reward"), agent.getTermination("reward"))
net.connect(env.getOrigin("reset"), agent.getTermination("reset"))
net.connect(env.getOrigin("learn"), agent.getTermination("learn"))
net.connect(env.getOrigin("reset"), agent.getTermination("save_state"))
net.connect(env.getOrigin("reset"), agent.getTermination("save_action"))

net.connect(agent.getOrigin("action_output"), env.getTermination("action")), 100, 0.001)

I then modified the environment folder and created a a file.

class CartPoleEnvironment(et.EnvironmentTemplate):
"""A template for defining environments to interact with RL agents.
:input action: vector representing action selected by agent
:output state: vector representing current state
:output reward: reward value

def __init__(self, stateD, actions, name="CartPole"):
    """Initialize environment variables.

    :param name: name for environment
    :param stateD: dimension of state representation
    :param actions: actions available to the system
    :type actions: list of tuples (action_name,action_vector)
    self.environment = gym.make('CartPole-v0').env

    self.actions = actions
    self.state = env.reset()
    self.action = None
    self.reward = 0.0

    self.total_reward = 0.0

    # nef.SimpleNode.__init__(self, name)
    # self.getTermination("action").setDimensions(len(actions[0][1]))

def tick(self):
    self.state, self.reward, self.done,_=env.step(action)
    self.total_reward += self.reward

    if self.done:
        self.state = env.reset()
        print("total reward: ", self.total_reward)
        self.total_reward = 0.0

To import gym, I use the following:

import gym

This becomes a problem because I then have issues with some code in the library not being compatible with Python 2 (for example, lines that use the keyword with). I suspect this is because when I use nengo-cl ../hrlproject/misc/ gridworld in cmd, nengo-cl runs everything on Python 2? Essentially I found it impossible to really interface other libraries in a custom environment?

My questions are as follows:

  1. Is there a way I can integrate the OpenAI gym library with this implementation of NHRL? This would be helpful as there are other custom libraries I would like to run that are made with Python 3.

  2. If 1 isn’t possible, are there other implementations of NHRL I could use that are available? Otherwise, what process would you recommend if I wanted to implement this algorithm from scratch using Nengo 3.1 and make it compatible with Python 3? I’m worried that having to reimplement and debug it would be an extremely time consuming process.

  3. Ultimately I want to interface NHRL with a library for robotic simulation and control in a continuous action/state space. I know NHRL is discrete w.r.t action spaces, so I wanted to experiment on just discretizing the action space and comparing performance with algorithms like TD3 or DDPG. Is there another approach besides NHRL I can take? I’ve already experimented with approaches presented here.

Tl;dr: How can I implement OpenAI gym environments with this implementation of NHRL?

Hi Kshivvy,

Thank you for the detailed questions. Following would be my answer:

  1. The NHRL library is implemented with nengo-1.4, which would be very difficult to integrate with the OpenAI gym. Even if, the integration is made possible, the underlying nengo would have version 1.4 which has its own limitations.

  2. I don’t think we have other implementations of NHRL. As you mentioned, the path forward would be to reimplement it in latest nengo, and that’s an open research problem requiring a lot of work as you suggested.

I will let you know the answer for the third question shortly.

  1. I don’t think there is anything officially published or nicely packaged to try at the moment.