Playing games with Machine Learning and Python

Machine learning is a crazy new innovation that is poised to disrupt human life as we know.Gaining a deeper understanding of it can actually help one understand how it works and innovate with it.

In this tutorial we will be making a simple AI based agent which learns how to play a simple using Python(and a couple of libraries).


We will be using a neural network for this task.


A neural network is a computer science construct that imitates the biological neuron in it’s behaviour. The simplest type of a neural network is called a perceptron.

Image result for perceptron neural network

A perceptron consists of a number of inputs (denoted by X1, X2…..Xn) and their associated weights(denoted by W1,W2….Wn) in the above diagram.

These last diagram is the activation function of the perceptron.


Training a neural network generally involves feeding the neural network sample inputs and sample output. Using this the neural network changes the weight associated with each input in a way that minimizes error.

As a mental model I like to think of training a neural network as a very advanced version of curve fitting.

This tutorial doesn’t assume/require anymore knowledge about Neural networks, however incase you want more information check out this.

Open AI Gym and Scikit-learn

We shall be using Open AI gym to get the gaming environment. It provides the simplified way to control games using observation vectors and action vectors.

We shall be using it’s CarWheel environment.


An agent needs to balance a pole on top of a cart. Game ends incase the cart moves more than 2.4 units or the pole crosses an angle of more than 15 degrees.

Getting Started

You will need to install the following packages:

pip install -U numpy scipy pandas scikit-learn

To install open ai gym you need to clone and install the package manually.

First install all the dependencies

sudo apt-get install -y python-numpy python-dev cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev python-opengl libboost-all-dev libsdl2-dev swig

Now clone and install open AI gym

git clone 
cd gym 
pip install -e '.[all]'

Use .[all] for installing all environments available in the package.

We are going to the follow the steps mentioned below in the given order:

  1. Create the initial training data: Since we do not have any previously defined data, we will start by creating a couple of data points with random input.
  2. Training the neural network: This involves training the neural network using the data generated from the random move capture.
  3. Playing the game using the trained Neural Network: We will play the game again this time using the predictions from the Neural Network as in the action.

The below given function defines the order of everything.

def run_simulation():
Function for running a complete simulation of the cart pole problem
    # setup the environment
    env = gym.make('CartPole-v0')
    env._max_episode_steps = max_steps * 100

    # initial dataset to train the neural net on
    initial_data = get_random_moves(env)

    input_list, output_list = get_inputs(initial_data)

    # get the trained instance of the neural network back
    curr_nn = train(input_list, output_list)

    # play game using the trained curr_nn
    play_game(env, curr_nn)

Creating training data

Since we need to train the neural network to perform a play the game, but don’t really have any training data, we will start by making random moves and selecting the cases where the agent reaches a threshold score. This will form our basic training data. We create the sample data by playing 50K games randomly. The code is given below:

def get_random_moves(game_env):
Function for getting random moves for a specific score
    initial_data = []
    score_list = []

    for _ in range(initial_game_num):

        # reset the game env after every game is done.

        # intialize the current score to 0
        score = 0

        # complete play history of a game
        prev_observation = []
        observation_list = []
        action_list = []

        # while we only iterate 10000 times, in realty the
        # game will finish much quicker because the agent will loose.
        for i in range(max_steps):

            # let supply a random action to the
            # game
            action = random.randrange(0,2)

            # the current state of the game
            observation, reward, done, info = game_env.step(action)

            # use the prev_state of the game
            # to give a action now.
            if len(prev_observation) > 0:

            # action made, current state becomes previous state now
            prev_observation = observation

            # maintain the record of the score
            score = score + reward

            # incase we loose
            if done:

        if score > intial_score_threshold:
            initial_data.append([observation_list, action_list])


    # the average score by our random agent
    print("Average Score during random play: {}".format(get_avg(score_list)))

    # return the intial training data
    return initial_data


Training the Neural Network

Training the neural network involves training the neural network using the data received from the above function. Training the NN is quite straightforward:

def train(input_list, output_list):
Function for intializing and training a neural network
    clf = MLPClassifier(solver='adam', activation='relu', verbose=make_verbose, learning_rate='adaptive', hidden_layer_sizes=(500,)), output_list)
    return clf

Playing the game using the Neural Network

Using the above trained neural network we can predict what the next move should be using the current state of the game environment. Code for playing the game:

def play_game(game_env, curr_nn):
Function to play the cartpole game using
trained nn
    final_score = []
    for K in range(trained_games):
        score = 0
        prev_observation = []

        for i in range(max_steps):

            # if this the first time the game is running
            # then make the first move randomly, since we
            # don't have a previous state.
            if len(prev_observation) == 0:
                action = random.randrange(0,2)
            # for all the subsequent moves, since there
            # are observations for the environment, predict
            # using the neural network.
                currs = np.array(prev_observation)
                action = curr_nn.predict([currs])[0]

            observation, reward, done, info = game_env.step(action)
            prev_observation = observation
            score = score + reward

            # end game incase the agent loses.
            if done:

            print("Game Number {} Score: {}".format(K,score))

    print("Average Score for Trained Agent: {}".format(get_avg(final_score)))

Note: All the previous games were played without rendering(since this makes it very fast), and these games are going to be played with UI hence end up being slow. If you want to test your trained NN on a large number of games (without waiting for a very long time) consider commenting out the line game_env.render() from the code above.

After this  a window should open up with the agent playing a game.  The final result should look something like this:


While this is clear improvement from the random agent, this is not the end in terms of the quality of the player. You can start by experimenting with the activation functions and the number of layers in the neural network.

Also you can find the code to do all this here.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s