One year ago, I was drawn into the world of competitive Pokemon. It’s an incredibly deep game that requires quite an effort to master. First, you have to memorise a vast amount of information: Which type of attack is most effective against which type of Pokemon? What are the base stats of the different Pokemon, and how do they get affected by training and breeding? What attacks are mostly used on Charizard? Fortunately, there are excellent resources available for helping out with this part of the game.

The more exciting aspect, however, is figuring out how to play optimally, knowing all of these intricate details already. There is also advice for this out there, but often it’s mainly things like: “Just practice a lot!” or “Try to learn from your mistakes!”

In this series of blog posts, I will link competitive Pokemon play to the well-studied fields of game theory and reinforcement learning. That’s quite a theoretically way of looking at it, so I will also try to include some practical battling advice.

## How to play optimally?

Pokemon is a game of hidden information, involves lots of randomness and does not have a Nash Equilibrium in pure strategies. I will talk about all of this in later blog posts. For now, let us just assume that you have perfect information about your opponents’ team, there is no randomness involved, and both players use a pure strategy.

Under these assumptions, it is quite easy to describe how to play optimal, but very difficult to actually do it, if you have no access to nearly infinite computing power. Since the problem of solving this simplified version of Pokemon battling has optimal substructure, you actually just need this little formula, called Bellman equation:

Concerning Pokemon battles, the variables and functions in this equation can be described in the following way:

**x** is the current **state** of the battle. What Pokemon are currently on the field? What Pokemon have already fainted? Is Stealth Rock set?

**V is the value function. V(x)** describes how valuable the current state is. So in Pokemon V(x) would be how likely you will win the current battle given that it’s in the state x and both players will play optimally the rest of the game

**a** means **action**. In Pokemon actions are choosing the lead Pokemon, choosing an attack or switching out the current Pokemon.

**Γ(x)** is the set of all the possible actions you can take right now. So this set may be smaller if you are trapped by Shadow Tag or locked into Outrage.

**F is called the reward function. F(x,a)** is the reward you get for taking action a in the state x. A good way for defining the reward function F in Pokemon is to give a reward of 1 if you win the game at the end of the turn and a reward of 0 otherwise.

**T(x, a)** is the state you get into if you choose action a in state x.

**β** is a discount factor and represents the patience of the player. A low β means that the player prefers quick wins to safer wins that will take longer.

So with this formula, you can now calculate how good a particular situation in a battle is. It is then straightforward to play optimally: Just choose the action where the resulting state is as valuable as possible.

## How does that help me choose the right plays?

Using Bellmann equations to evaluate game state is a recursive operation which involves simulating all possible ways the game could go. Since Pokemon battles can take many turns and the state space expands exponentially, this is computationally completely infeasible.

However, imagining different ways the game could go will help you in many situations, especially at the end of the battle, when there are not many options left.

A more practical way to go about being the very best Pokemon trainer is called **Value Function Approximation** and will be discussed in the next blog post.

Pingback: The math behind competitive Pokemon, Part 2: Value Function Approximation | Niklas Riewald