I know this is a long post, so apologies in advance. You can find the specific questions and a tl;dr at the bottom.
I was trying to run this NHRL implementation to test the algorithm on some different environments. I have seen this post already, but it doesn’t look like he is using NHRL or anything beyond normal Q estimation.
It was extremely difficult, but I was able to successfully run the given gridworld example. I had to manually add the library for Nengo 1.4, as well as modify a file that was calculating psuedoinverses, reinstall Java, etc. However, I couldn’t visualize anything since it looks like I needed some .bak files to do so.
Now I’m looking to use this implementation to run on an OpenAI gym environment, specifically the cartpole problem.
I modified the run.py file to include this function:
def run_cartpole(args, seed=None):
if seed is not None:
HRLutils.set_seed(seed)
seed = HRLutils.SEED
net = nef.Network("run_cartpole")
stateN = 400
stateD = 4
actions = [("left", [0]), ("right", [1])]
agent = smdpagent.SMDPAgent(stateN, stateD, actions, stateradius = 50, **args)
net.add(agent)
env = cartpoleenvironment.CartPoleEnvironment(stateD, actions)
net.add(env)
net.connect(env.getOrigin("state"), agent.getTermination("state_input"))
net.connect(env.getOrigin("reward"), agent.getTermination("reward"))
net.connect(env.getOrigin("reset"), agent.getTermination("reset"))
net.connect(env.getOrigin("learn"), agent.getTermination("learn"))
net.connect(env.getOrigin("reset"), agent.getTermination("save_state"))
net.connect(env.getOrigin("reset"), agent.getTermination("save_action"))
net.connect(agent.getOrigin("action_output"), env.getTermination("action"))
net.network.simulator.run(0, 100, 0.001)
I then modified the environment folder and created a a cartpoleenvironment.py file.
class CartPoleEnvironment(et.EnvironmentTemplate):
"""A template for defining environments to interact with RL agents.
:input action: vector representing action selected by agent
:output state: vector representing current state
:output reward: reward value
"""
def __init__(self, stateD, actions, name="CartPole"):
"""Initialize environment variables.
:param name: name for environment
:param stateD: dimension of state representation
:param actions: actions available to the system
:type actions: list of tuples (action_name,action_vector)
"""
self.environment = gym.make('CartPole-v0').env
self.actions = actions
self.state = env.reset()
self.action = None
self.reward = 0.0
self.total_reward = 0.0
# nef.SimpleNode.__init__(self, name)
# self.getTermination("action").setDimensions(len(actions[0][1]))
def tick(self):
self.state, self.reward, self.done,_=env.step(action)
self.total_reward += self.reward
if self.done:
self.state = env.reset()
print("total reward: ", self.total_reward)
self.total_reward = 0.0
To import gym, I use the following:
sys.path.append('C:/Users/keshav/AppData/Local/Programs/Python/Python37/Lib/site-packages/')
import gym
This becomes a problem because I then have issues with some code in the library not being compatible with Python 2 (for example, lines that use the keyword with). I suspect this is because when I use nengo-cl ../hrlproject/misc/run.py gridworld
in cmd, nengo-cl runs everything on Python 2? Essentially I found it impossible to really interface other libraries in a custom environment?
My questions are as follows:
-
Is there a way I can integrate the OpenAI gym library with this implementation of NHRL? This would be helpful as there are other custom libraries I would like to run that are made with Python 3.
-
If 1 isn’t possible, are there other implementations of NHRL I could use that are available? Otherwise, what process would you recommend if I wanted to implement this algorithm from scratch using Nengo 3.1 and make it compatible with Python 3? I’m worried that having to reimplement and debug it would be an extremely time consuming process.
-
Ultimately I want to interface NHRL with a library for robotic simulation and control in a continuous action/state space. I know NHRL is discrete w.r.t action spaces, so I wanted to experiment on just discretizing the action space and comparing performance with algorithms like TD3 or DDPG. Is there another approach besides NHRL I can take? I’ve already experimented with approaches presented here.
Tl;dr: How can I implement OpenAI gym environments with this implementation of NHRL?