Reinforcement learning has a bad reputation for being extremely data-hungry – so data-hungry it can only realistically be trained in simulation-generated data, e.g. in a computer game. We discuss how this can be cured using Bayesian Statistics, using an easily accessible small example. In the second part of this blog series, we see how this can be done in practice using TensorFlow Probability, a hot new tool from Google.

## What’s this Bayes stuff all about?

In a recent podcast interview, Andrew Gelman, a leading researcher and practitioner in Bayesian Statistics, characterizes the Bayesian way of working like this: “There are two approaches to statistics. One is to make very minimal assumptions, and the other is to make maximal assumptions.” The latter is true for Bayesian statistics, he then explains. This doesn’t sound attractive, right? Many in the data science community are used to thinking of assumptions as the dirty secret that you need for your models to work, but which also makes you vulnerable to errors.

## A dirty secret, turned into a modelling tool

The Bayesian approach to assumptions is very different: Instead of trying to avoid them, they are embraced as modelling tools. Bayesian modelling is very flexible in accommodating domain knowledge, which is turned into an integral part of the model calculations. This achieves two things at once: It is much easier to make use of knowledge about the problem domain, and to do it in a well-documented and transparent manner. It is also much easier to check and modify your assumptions when they’re part of the model itself. Doing a lot of checks is crucial to the Bayesian approach, minimizing the risk of errors.

## Reinforcement learning: the strange new kid on the block

If Bayesian statistics is the black sheep of the statistics family (and some people think it is), reinforcement learning is the strange new kid on the data science and machine learning block. It employs many of the familiar techniques from machine learning, but the setting is fundamentally different. You don’t follow the usual ritual of taking a big bunch of data, splitting it into partitions, train, evaluate and improve your model. The data your model works with in reinforcement learning is not some entity that is separate from the model itself. Instead, your model must choose from a set of actions, and gets a reward depending on this choice. Then it chooses the next action, gets the next reward, and so on, with your model trying to maximize the reward. Hence, data is not given. It is being produced while the model interacts with its environment.

## Reinforcement learning: why Bayes?

The best-known applications of reinforcement learning are connected to games. The defeat of an e-sports champion in the computer game Dota by OpenAI’s deep reinforcement agents has attracted a lot of attention. The same is true for Deepmind’s board game program AlphaZero, which is also based on reinforcement learning. The computational resources invested for this kind of approach are huge: OpenAI’s agents have played a total of 45 000 years of Dota in fast forward mode. And the importance of games and simulations in reinforcement learning is not restricted to high-profile cases that make the headlines. When you look at OpenAI Gym, a popular environment for training reinforcement agents, you see lots of computer game classics like Pong and several Atari games, along with physics simulations where an agent can learn to balance a pole on a cart. There is an interesting connection here to the Bayesian approach: In reinforcement learning, we often assume we know the rules of the environment and their interaction so well that we can set up a simulation as a training environment for the agents. In other words, reinforcement learning routinely works with strong assumptions. So strong that it is often applied to a purely simulated game setting that is isolated from the “real world”. What if we could use that other thing that works with strong assumptions, Bayesian statistics, to break through this isolation and use reinforcement learning in the real world?

## Reinforcement Learning and Bayesian statistics: a child’s game

Let’s try these abstract ideas and build something concrete. We will stay in the reinforcement learning tradition by using a game, but we’ll break with tradition in other ways: the learning environment will not be simulated. It will be the interaction with a real human like you, for example. As this is intended to be as simple as possible, the game we use will be the childhood’s classic rock, paper, scissors. Game theory says this game has a single equilibrium in which both players choose their actions uniformly at random. In plain English: you can’t do better than choosing randomly. But also, game theory makes strong assumptions, and they are rarely fulfilled when humans are involved. Humans are not good at being truly random, and so it is interesting to design a reinforcement learning agent that exploits the biases of its human counterpart.

## TensorFlow Probability, practical Bayesian statistics, rock, paper, and scissors

Stay tuned for the next part, where we…

- …wrap the gift paper off our new toy, TensorFlow Probability.
- …build a Bayesian model.
- …venture into to the dark art of mathemagic.

...get used to losing at rock, paper, scissors against our computer.