One year ago I was drawn into the world of competitive Pokemon. It’s an incredible deep game that requires quite an effort to master. First you have to memorize a huge amount of information: Which type of attack is most effective against which type of Pokemon? What are the base stats of the different Pokemon and how do they get affected by training and breeding? What attacks are mostly used on Charizard? Fortunately there are excellent ressources available for helping out with this part of the game.

The more interesting aspect however is figuring out how to play optimally, knowing all of these intricate details already. There is also advice for this out there, but often it’s mainly things like: “Just practice a lot!” or “Try to learn from your mistakes!”

In this series of blog posts I will link competitive Pokemon play to the well studied fields of game theory and reinforcement learning. That’s quite a theoretically way of looking at it, so I will also try to include some practical battling advice.

## How to play optimally?

We will see that it is quite easy to describe how to play optimal, but very difficult to actually do it, if you have no access to nearly infinite computing power. Since the problem of solving Pokemon battling has optimal substructure, you actually just need this little formula, called Bellman equation:

In regard to Pokemon battles the variables and functions in this equation can be described in the following way:

**x** is the current **state** of the battle. What Pokemon are currently on the field? What Pokemon have already fainted? Is Stealth Rock set?

**V is the value function. V(x)** describes how valuable the current state is. So in Pokemon V(x) would be how likely you will win the current battle given that it’s in the state x and both players will play optimally the rest of the game

**a** means **action**. In Pokemon actions are choosing the lead Pokemon, choosing an attack or switching out the current Pokemon. There is also quite a lot of randomness involved in Pokemon battles. Pokemon can for example miss attacks, be paralyzed and land critical hits. This involuntary actions also have to be included in the action space.

**Γ(x)** is the set of all the possible actions you can take right now. So this set may be smaller if you are trapped by Shadow Tag or locked into Outrage.

**F is called the reward function. F(x,a)** is the reward you get for taking action a in the state x. A good way for defining the reward function F in Pokemon is to give a reward of 1, if you win the game at the end of the turn and a reward of 0 otherwise.

**T(x,a)** is the state you get into, if you choose action a in state x.

**β** is a discount factor and represents the patience of the player. A low β means that the player prefers fast wins to safer wins that will take longer.

So with this formula you can now calculate how good a certain situation in a battle is. It is then very easy to play optimally: Just choose the action where the resulting state is as valuable as possible.

## How does that help me choose the right plays?

Using Bellmann equations to evaluate game state is a recursive operation which involves simulating all possible ways the game could go. Since Pokemon battles can take many turns and the state space expands exponentially, this is computationally completely infeasible.

However, imagining different ways the game could go will help you in many situations, especially at the end of the battle, when there are not many options left.

A more practical way to go about being the very best Pokemon trainer is called **Value Function Approximation** and will be discussed in the next blog post.

Pingback: The math behind competitive Pokemon, Part 2: Value Function Approximation | Niklas Riewald