In the previous part of this series, we explored how to build the probability distributions for sums of dice rolls. That is sufficient for a great fraction of dice rolling in D&D, but a critical extra case to be considered are rolls made with advantage or disadvantage. In this article, we investigate how to extend our definitions to include die rolls with (dis)advantage.

`\[ \newcommand{\given}{\:\lvert\:\mathopen} \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \newcommand{\dd}{\mathrm{d}} \newcommand{\adv}{\mathrm{adv}} \newcommand{\dis}{\mathrm{dis}} \]`

## Introduction¶

The probability distribution of throwing a single dice is well known and easy to write down — for a fair dice, every face is equally likely. As we saw in the previous article, we can then write down the probability distribution for multiple die rolls summed together using convolutions, where the base case was always the simple uniform distribution of a single dice.

Dungeons and Dragons doesn’t always use fair dice, though. There are situations or abilities where you are given better-than-normal chances by taking the better or two roles, and similarly times when you are penalized and have to take the worse of two roles. There is still only a single value which is taken from these cases, so the uniform probability base case is replaced by a new probability distribution that takes the better/worse of rolls.

In this article, we will derive the probability distributions for these biased rolls — typically called rolling with advantage and disadvantage.

## Rolling with advantage¶

Specifically, we define *rolling with advantage* — which will be denoted as
`\(M\dd N\adv\)`

for `\(M\)`

advantage rolls of an `\(N\)`

-sided die — to be result of
taking the maximum result of two dice rolls:
`\begin{align} \Prob{r \in \dd N\adv} \equiv \Prob{ \left\{ r = \operatorname{max}(r_1, r_2) : r_1 \in \dd N, r_2 \in \dd N \right\} } \label{eqn:advdie_statement_defn} \end{align}`

and likewise, `\(M \dd N\dis\)`

will be *rolling with disadvantage* wherein the
minimum of two rolls is used instead.

Breaking down this definition to an explicit distribution requires precisely
defining and interpreting the statement.
Let us define the problem in terms of two identifiable dice, corresponding to
the two possible values `\(r_1\)`

and `\(r_2\)`

, respectively.
In the case of advantage rolls, the maximum over two possibilities can be
expanded to two distinct case:
`\begin{align} r = \operatorname{max}(r_1, r_2) = \begin{cases} r_1 & \text{if } r_2 \le r_1 \\ r_2 & \text{otherwise} \end{cases} \label{eqn:max_verbose} \end{align}`

Thinking in terms of these cases, we can start building up the probability
statement.

First, let us work with the first case where `\(r_1 \ge r_2\)`

.
We want to write the probability in terms of the result `\(r\)`

, so the first
requirement is that `\(r = r_1\)`

, which occurs with probability
`\begin{align*} \Prob{r = r_1 \in \dd N} &= \frac{1}{N} \quad\text{assuming } 1 \le r \le N \end{align*}`

The first case then requires that `\(r_2 \le r_1\)`

, which will only occur if
`\(r_2\)`

is equal to any of the values `\(\{1, 2, \dots, r_1\}\)`

.
The probability of `\(r_2\)`

being equal to any given value is again `\(1/N\)`

,
but now there are `\(r_1 = r\)`

options; therefore
`\begin{align*} \Prob{r_2 \le r \in \dd N} &= \sum_{i = 1}^r \Prob{r_2 = i \in \dd N} = \sum_{i = 1}^r \frac{1}{N} \\ {} &= \frac{r}{N} \quad\text{assuming } 1 \le r \le N \end{align*}`

Together, these two statements allow us to write that the probability of
rolling two dice and finding the first die is greater than the second
(`\(r_2 \le r_1\)`

) and that it is equal to our target value (`\(r_1 = r\)`

) is
`\begin{align} \Prob{r = r_1 \in \dd N} \, \Prob{r_2 \le r_1 \in \dd N} &= \frac{r}{N^2} \label{eqn:advdie_max1} \end{align}`

By symmetry, we can easily write down the probability for the second case
in the maximum where `\(r_1 \le r_2\)`

by just swapping identification of
`\(r_1 \leftrightarrow r_2\)`

in Eqn. `\(\ref{eqn:advdie_max1}\)`

:
`\begin{align} \Prob{r = r_2 \in \dd N} \, \Prob{r_1 \le r_2 \in \dd N} &= \frac{r}{N^2} \label{eqn:advdie_max2} \end{align}`

The sum of Eqns. `\(\ref{eqn:advdie_max1}\)`

and `\(\ref{eqn:advdie_max2}\)`

is
*almost* the answer.
The subtle issue is that in Eqn. `\(\ref{eqn:max_verbose}\)`

we wrote the
conditions as “`\(r_2 \le r_1\)`

” and “*otherwise*” which implies `\(r_1 < r_2\)`

.
Notice that by symmetric exchange of labels, though, we have “`\(\le\)`

” conditions
in both of the probability statements rather than a single “`\(\le\)`

” and then a
“`\(<\)`

” condition (without equality).
This subtle distinction means that we’ve actually double-counted the probability
for the case where `\(r_1 = r_2\)`

, and therefore the probability will be
overestimated.
The solution is simple, though — we simply add an additional correction term
corrects for the double-counting, which we know is
`\begin{align*} \Prob{r_1 = r_2 = r \in \dd N} = \Prob{r_1 = r \in \dd N} \, \Prob{r_2 = r \in \dd N} = \frac{1}{N^2} \end{align*}`

Putting all of the pieces together, we can simplify the two separate conditions
into a single statement with double the probability because our choice of
`\(r_1\)`

vs `\(r_2\)`

is arbitrary, and then subtracting off the correction term for
double-counting, we have that the probability of rolling a die with advantage
is given by:
`\begin{align} \Prob{r \in \dd N\adv} &\equiv 2 \, \Prob{r_1 = r \in\dd N} \, \Prob{r_2 \le r_1 \in\dd N} - \Prob{r_1 = r_2 = r \in\dd N} \label{eqn:dieadv_defn} \end{align}`

or expanded explicitly into functional form,
`\begin{align} \Prob{r \in \dd N\adv} &= \begin{cases} \frac{2r - 1}{N^2} & 1 \le r \le N \\ 0 & \text{otherwise} \end{cases} \label{eqn:dieadv_function} \end{align}`

with corresponding definition as a Julia function:

```
using OffsetArrays
function prob_dNadv(N)
return OffsetVector([(2i-1)/N^2 for i in 1:N], 1:N)
end
```

Thankfully, we can also check this result through numerical experiments. We can easily draw hundreds of thousands of random die throws, take the larger one of pairs, and then histogram the resulting numbers to build a frequency distribution. The observed frequency distribution is expected to converge to a scaled version of the probability distribution as the number of draws approaches infinity (with some small residual random fluctuation). Doing just that:

```
using Plots, Random
Nrolls = 100_000
rollpairs = rand(1:20, Nrolls, 2)
advrolls = maximum(rollpairs, dims = 2)
stephist(advrolls, bins = 0.5:20.5, normalize = :probability,
label = "simulation")
sticks!(prob_dNadv(20), label = "d20adv theory")
xlabel!("die roll")
ylabel!("probability")
title!("Advantage d20 die rolling, simulation vs theory")
```

We see that the calculated probability distribution (the orange sticks) agrees well with the simulated distribution (blue stairs). (We will quantify what “agrees well” means in a future article.)

## Rolling with disadvantage¶

Following a very similar set of derivations, we can also work out the probability
distribution for disadvantage rolls;
with a little bit of intuition, it should be no surprise that the distribution
“slopes” the opposite direction with the greatest probability being for `\(1\)`

and
the lowest to achieve `\(N\)`

, with the analytic distribution being given by
`\begin{align} \Prob{r \in \dd N\dis} &= \begin{cases} \frac{2(N - r) + 1}{N^2} & 1 \le r \le N \\ 0 & \text{otherwise} \end{cases} \label{eqn:diedis_function} \end{align}`

## Conclusions¶

As mentioned in the introduction, the advantage and disadvantage probability
distributions are new base cases which can be summed with other dice rolls as
discussed in the previous article.
Therefore, I have updated the attached
`DiceRolling.jl`

module to include new advantage (e.g. `1d20adv`

) and disadvantage (`1d20dis`

)
fundamental dice which can be used in dice roll expressions.

Because we’ve already seen the `2d20adv`

case (which is often used in saving
throws and attack rolls), let’s use a less common example to demonstrate how
advantage and disadvantage bias the entire distribution over many rolls.
Instead of accumulating hit points using straight rolls, we’ll consider a
character whose HP is taken from all advantage rolls —
`\(9\dd10 + 10\)`

vs `\(9\dd10\adv + 10\)`

for a 10th level paladin:

Unlike the base cases `\(1\dd10\)`

distribution which monotonically increases
from `\(1\)`

through `\(10\)`

, the sum over many such distributions is approximately
Gaussian^{1} again, in accordance with the
Central Limit Theorem.

If you compare the distributions carefully, though, you should find that the advantage distribution is perceptibly less Gaussian than the normal distribution — each term being unsymmetric lowers the rate of convergence toward the Gaussian distribution. ↩︎