Playing around in OpenAI Gym in Jupyter

First, Figure out Jupyter Notebook Stuff

This tutorial helped a lot.

# The typical imports
import gym
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Imports specifically so we can render outputs in Jupyter.
from JSAnimation.IPython_display import display_animation
from matplotlib import animation
from IPython.display import display


def display_frames_as_gif(frames):
    """
    Displays a list of frames as a gif, with controls
    """
    #plt.figure(figsize=(frames[0].shape[1] / 72.0, frames[0].shape[0] / 72.0), dpi = 72)
    patch = plt.imshow(frames[0])
    plt.axis('off')

    def animate(i):
        patch.set_data(frames[i])

    anim = animation.FuncAnimation(plt.gcf(), animate, frames = len(frames), interval=50)
    display(display_animation(anim, default_mode='loop'))

Simple Cartpole Example in Jupyter

env = gym.make('CartPole-v0')

# Run a demo of the environment
observation = env.reset()
cum_reward = 0
frames = []
for t in range(5000):
    # Render into buffer. 
    frames.append(env.render(mode = 'rgb_array'))
    action = env.action_space.sample()
    observation, reward, done, info = env.step(action)
    if done:
        break
env.render(close=True)
display_frames_as_gif(frames)

OpenAI Gym - Documentation

Working through this entire page on starting with the gym. First, we again show their cartpole snippet but with the Jupyter support added in by me.

env = gym.make('CartPole-v0')
cum_reward = 0
frames = []
num_episodes=40
for i_episode in range(num_episodes):
    observation = env.reset()
    for t in range(500):
        # Render into buffer. 
        frames.append(env.render(mode = 'rgb_array'))
        action = env.action_space.sample() # random action
        observation, reward, done, info = env.step(action)
        if done:
            print("\rEpisode {}/{} finished after {} timesteps".format(i_episode, num_episodes, t+1), end="")
            break
env.render(close=True)
display_frames_as_gif(frames)

Environments

Environments all descend from the Env base class. You can view a list of all environments via:

from gym import envs
print(envs.registry.all())

Important environment functions/properties:

  • step: Returns info regarding what our actions are doing to the environment at each step. The return values:
    • observation (object)
    • reward (float)
    • done (boolean)
    • info (dict)
  • reset: returns an initial observation.
  • Space objects: two objects (below) that describe the valid actions and observations.
    • action_space [returns Discrete(2) for cartpole]. Example usage of Discrete: python from gym import spaces space = spaces.Discrete(8) # Set with 8 elements {0, 1, 2, ..., 7} x = space.sample() assert space.contains(x) assert space.n == 8
    • observation_space [returns Box(4) for cartpole]