Machine learning is a crazy new innovation that is poised to disrupt human life as we know.Gaining a deeper understanding of it can actually help one understand how it works and innovate with it.
In this tutorial we will be making a simple AI based agent which learns how to play a simple using Python(and a couple of libraries).
Theory
We will be using a neural network for this task.
Structure
A neural network is a computer science construct that imitates the biological neuron in it’s behaviour. The simplest type of a neural network is called a perceptron.
A perceptron consists of a number of inputs (denoted by X1, X2…..Xn) and their associated weights(denoted by W1,W2….Wn) in the above diagram.
These last diagram is the activation function of the perceptron.
Training
Training a neural network generally involves feeding the neural network sample inputs and sample output. Using this the neural network changes the weight associated with each input in a way that minimizes error.
As a mental model I like to think of training a neural network as a very advanced version of curve fitting.
This tutorial doesn’t assume/require anymore knowledge about Neural networks, however incase you want more information check out this.
Open AI Gym and Scikit-learn
We shall be using Open AI gym to get the gaming environment. It provides the simplified way to control games using observation vectors and action vectors.
We shall be using it’s CarWheel environment.
An agent needs to balance a pole on top of a cart. Game ends incase the cart moves more than 2.4 units or the pole crosses an angle of more than 15 degrees.
Getting Started
You will need to install the following packages:
pip install -U numpy scipy pandas scikit-learn
To install open ai gym you need to clone and install the package manually.
First install all the dependencies
sudo apt-get install -y python-numpy python-dev cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev python-opengl libboost-all-dev libsdl2-dev swig
Now clone and install open AI gym
git clone https://github.com/openai/gym
cd gym
pip install -e '.[all]'
Use .[all] for installing all environments available in the package.
We are going to the follow the steps mentioned below in the given order:
- Create the initial training data: Since we do not have any previously defined data, we will start by creating a couple of data points with random input.
- Training the neural network: This involves training the neural network using the data generated from the random move capture.
- Playing the game using the trained Neural Network: We will play the game again this time using the predictions from the Neural Network as in the action.
The below given function defines the order of everything.
def run_simulation(): """ Function for running a complete simulation of the cart pole problem """ # setup the environment env = gym.make('CartPole-v0') env._max_episode_steps = max_steps * 100 env.reset() # initial dataset to train the neural net on initial_data = get_random_moves(env) input_list, output_list = get_inputs(initial_data) # get the trained instance of the neural network back curr_nn = train(input_list, output_list) # play game using the trained curr_nn play_game(env, curr_nn)
Creating training data
Since we need to train the neural network to perform a play the game, but don’t really have any training data, we will start by making random moves and selecting the cases where the agent reaches a threshold score. This will form our basic training data. We create the sample data by playing 50K games randomly. The code is given below:
def get_random_moves(game_env): """ Function for getting random moves for a specific score """ initial_data = [] score_list = [] for _ in range(initial_game_num): # reset the game env after every game is done. game_env.reset() # intialize the current score to 0 score = 0 # complete play history of a game prev_observation = [] observation_list = [] action_list = [] # while we only iterate 10000 times, in realty the # game will finish much quicker because the agent will loose. for i in range(max_steps): # let supply a random action to the # game action = random.randrange(0,2) # the current state of the game observation, reward, done, info = game_env.step(action) # use the prev_state of the game # to give a action now. if len(prev_observation) > 0: observation_list.append(prev_observation) action_list.append(action) # action made, current state becomes previous state now prev_observation = observation # maintain the record of the score score = score + reward # incase we loose if done: break if score > intial_score_threshold: initial_data.append([observation_list, action_list]) score_list.append(score) # the average score by our random agent print("Average Score during random play: {}".format(get_avg(score_list))) # return the intial training data return initial_data
Training the Neural Network
Training the neural network involves training the neural network using the data received from the above function. Training the NN is quite straightforward:
def train(input_list, output_list): """ Function for intializing and training a neural network """ clf = MLPClassifier(solver='adam', activation='relu', verbose=make_verbose, learning_rate='adaptive', hidden_layer_sizes=(500,)) clf.fit(input_list, output_list) return clf
Playing the game using the Neural Network
Using the above trained neural network we can predict what the next move should be using the current state of the game environment. Code for playing the game:
def play_game(game_env, curr_nn): """ Function to play the cartpole game using trained nn """ final_score = [] for K in range(trained_games): game_env.reset() score = 0 prev_observation = [] for i in range(max_steps): game_env.render() # if this the first time the game is running # then make the first move randomly, since we # don't have a previous state. if len(prev_observation) == 0: action = random.randrange(0,2) else: # for all the subsequent moves, since there # are observations for the environment, predict # using the neural network. currs = np.array(prev_observation) currs.reshape(-1,1) action = curr_nn.predict([currs])[0] observation, reward, done, info = game_env.step(action) prev_observation = observation score = score + reward # end game incase the agent loses. if done: break print("Game Number {} Score: {}".format(K,score)) final_score.append(score) print("Average Score for Trained Agent: {}".format(get_avg(final_score)))
Note: All the previous games were played without rendering(since this makes it very fast), and these games are going to be played with UI hence end up being slow. If you want to test your trained NN on a large number of games (without waiting for a very long time) consider commenting out the line game_env.render() from the code above.
After this a window should open up with the agent playing a game. The final result should look something like this:
Further
While this is clear improvement from the random agent, this is not the end in terms of the quality of the player. You can start by experimenting with the activation functions and the number of layers in the neural network.
Also you can find the code to do all this here.