Playing around in OpenAI Gym in Jupyter
21 Dec 2016First, Figure out Jupyter Notebook Stuff
This tutorial helped a lot.
# The typical imports
import gym
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# Imports specifically so we can render outputs in Jupyter.
from JSAnimation.IPython_display import display_animation
from matplotlib import animation
from IPython.display import display
def display_frames_as_gif(frames):
"""
Displays a list of frames as a gif, with controls
"""
#plt.figure(figsize=(frames[0].shape[1] / 72.0, frames[0].shape[0] / 72.0), dpi = 72)
patch = plt.imshow(frames[0])
plt.axis('off')
def animate(i):
patch.set_data(frames[i])
anim = animation.FuncAnimation(plt.gcf(), animate, frames = len(frames), interval=50)
display(display_animation(anim, default_mode='loop'))
Simple Cartpole Example in Jupyter
env = gym.make('CartPole-v0')
# Run a demo of the environment
observation = env.reset()
cum_reward = 0
frames = []
for t in range(5000):
# Render into buffer.
frames.append(env.render(mode = 'rgb_array'))
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
if done:
break
env.render(close=True)
display_frames_as_gif(frames)
OpenAI Gym - Documentation
Working through this entire page on starting with the gym. First, we again show their cartpole snippet but with the Jupyter support added in by me.
env = gym.make('CartPole-v0')
cum_reward = 0
frames = []
num_episodes=40
for i_episode in range(num_episodes):
observation = env.reset()
for t in range(500):
# Render into buffer.
frames.append(env.render(mode = 'rgb_array'))
action = env.action_space.sample() # random action
observation, reward, done, info = env.step(action)
if done:
print("\rEpisode {}/{} finished after {} timesteps".format(i_episode, num_episodes, t+1), end="")
break
env.render(close=True)
display_frames_as_gif(frames)
Environments
Environments all descend from the Env base class. You can view a list of all environments via:
from gym import envs
print(envs.registry.all())
Important environment functions/properties:
- step: Returns info regarding what our actions are doing to the environment at each step. The return values:
- observation (object)
- reward (float)
- done (boolean)
- info (dict)
- reset: returns an initial observation.
- Space objects: two objects (below) that describe the valid actions and observations.
- action_space [returns Discrete(2) for cartpole]. Example usage of Discrete:
python from gym import spaces space = spaces.Discrete(8) # Set with 8 elements {0, 1, 2, ..., 7} x = space.sample() assert space.contains(x) assert space.n == 8
- observation_space [returns Box(4) for cartpole]
- action_space [returns Discrete(2) for cartpole]. Example usage of Discrete: