The last post was about playing optimal Pokemon games by calculating the value function V(x) using Bellmann equations. Unfortunately, this was impractical, so we need another way to calculate the likelihood of winning games.
One solution to this problem is called Value Function Approximation. The idea behind it is to not calculate the value function exactly, but to approximate it using supervised machine learning algorithms (e.g. Neural Networks). In some ways, this resembles how most people play Pokemon. They don’t think through all the possible ways the game could go, but just imagine a few possibilities and think about how good the situation would be if they chose one of them.
Good proxies for evaluating the chance to win might, for example, be the total health of your Pokemon compared to those of your opponent. Or if you have a boosted Pokemon on the field that can sweep. Or, more generally, that you were in a similar situation before and won the game at that time.
To get a good understanding of how Value Function Approximation works and how one would go about building a reinforcement algorithm, that can play Pokemon, I recommend these two resources:
- Reinforcement Learning: An Introduction (2nd Edition) by Richard Sutton and Andrew Barto. The second edition is currently a work in progress and freely available from one of the Author’s website
- Reinforcement Learning course by David Silver. David Silver is a researcher at DeepMind and one of the creators behind Alpha Go. In 2015 he held this excellent course at UCL.
Up until now, nobody has built an algorithm that can play Pokemon on a high level.

No need for Reinforcement Learning when you are Alakazam and have an IQ over 5000
Hands-on battle advice
So is there a way to leverage knowledge about Reinforcement Learning to improve Pokemon battle skill? Maybe! A few things come to mind:
- While Reinforcement Learning algorithms can learn from self-play, they are able to learn much faster, if you provide them with training data from expert human level play. One of the reasons is that it’s harder to determine what good plays are if you only make bad plays in the beginning. So you can try to boost your own skill by watching videos from good Pokemon players (like Chemcoop for the Battle Spot Singles format)
- One of the challenges when training Machine Learning algorithms is always that of overfitting. A phenomenon humans are also very susceptible to. It means learning something not because it’s true, but because it was part of your anecdotal experience (=training data). So if you see a Pokemon with an unusual move three times in a row, don’t just assume that it’s now very common, but trust in statistical resources like this website instead.
- The more I learned about Reinforcement Learning, the more I realised that my brain must be doing all sorts of different things that I’m not even aware of. So sometimes a good strategy might not even be that important, and we should just let our brains do their things and follow our hunches. Maybe regular sleep, a healthy diet, and a calm mindset will help you improve more than reading articles about the math behind competitive Pokemon.
The next post will be about Nash equilibria, the very heart of competitive multiplayer games, so don’t miss it!
Pingback: The math behind competitive Pokemon, Part 1: Bellman Equations | Niklas Riewald