The math behind competitive Pokemon, Part 2: Value Function Approximation

The last post was about playing optimal Pokemon games by calculating the value function V(x) using Bellmann equations. Unfortunately, this was impractical, so we need another way to calculate the likelihood of winning games.

One solution to this problem is called Value Function Approximation. The idea behind it is to not calculate the value function exactly, but to approximate it using supervised machine learning algorithms (e.g. Neural Networks). In some ways, this resembles how most people play Pokemon. They don’t think through all the possible ways the game could go, but just imagine a few possibilities and think about how good the situation would be if they chose one of them.

Good proxies for evaluating the chance to win might, for example, be the total health of your Pokemon compared to those of your opponent. Or if you have a boosted Pokemon on the field that can sweep. Or, more generally, that you were in a similar situation before and won the game at that time.

To get a good understanding of how Value Function Approximation works and how one would go about building a reinforcement algorithm, that can play Pokemon, I recommend these two resources:

Up until now, nobody has built an algorithm that can play Pokemon on a high level.


No need for Reinforcement Learning when you are Alakazam and have an IQ over 5000

Hands-on battle advice

So is there a way to leverage knowledge about Reinforcement Learning to improve Pokemon battle skill? Maybe! A few things come to mind:

  • While Reinforcement Learning algorithms can learn from self-play, they are able to learn much faster, if you provide them with training data from expert human level play. One of the reasons is that it’s harder to determine what good plays are if you only make bad plays in the beginning. So you can try to boost your own skill by watching videos from good Pokemon players (like Chemcoop for the Battle Spot Singles format)
  • One of the challenges when training Machine Learning algorithms is always that of overfitting. A phenomenon humans are also very susceptible to. It means learning something not because it’s true, but because it was part of your anecdotal experience (=training data). So if you see a Pokemon with an unusual move three times in a row, don’t just assume that it’s now very common, but trust in statistical resources like this website instead.
  • The more I learned about Reinforcement Learning, the more I realised that my brain must be doing all sorts of different things that I’m not even aware of. So sometimes a good strategy might not even be that important, and we should just let our brains do their things and follow our hunches. Maybe regular sleep, a healthy diet, and a calm mindset will help you improve more than reading articles about the math behind competitive Pokemon.


The next post will be about Nash equilibria, the very heart of competitive multiplayer games, so don’t miss it!


One thought on “The math behind competitive Pokemon, Part 2: Value Function Approximation

  1. Pingback: The math behind competitive Pokemon, Part 1: Bellman Equations | Niklas Riewald

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s