Generalization of the law of large numbers
Introduction
In December 2020, Kaggle launched its annual Santa's Christmas competition. This year participants were trying to solve multi-armed bandit problem - having a set of 100 one-armed bandits (slot machines), each with a random initial probability of a reward which is not known to participants. Each machine can output a reward according to its probability of a reward when pulled. The goal is to get the highest possible reward, while the number of pulls (trials) is limited. There were two twists to the classic problem, your agent plays against another agent and the likelihood of reward decreases by 3 % after each machine pull (i.e. probability of a reward after N-th pull is P_{N} = P_0 \cdot 0.97^N).
In order to do well in the competition, one needs to estimate the initial likelihood of reward P_0 of a machine as precisely as possible. This is not possible to do with the original law of large numbers as it is stated only for the case where random variables (results of trials) are independent and identically distributed. In our case, they are not identically distributed as the distribution changes after each trial. It is not a simple Bernoulli trial, because the probability of success changes. We need to restate the law of large numbers in such a way that the condition of the probability of success not changing is lifted.
Generalization using history
The original law of large numbers works only with counts. Be it for one-armed bandits, where we need to know only the number of times when we did or didn't get a reward, or for a simple dice rolling, where we count occurrences of all values. It does not work with history because it does not matter whether one-armed bandit produces history [0, 0, 1, 1] or [1, 0, 0, 1]. The count of values is still the same and we can assume that probability to get a reward is 50 % as the average converges to the expected value (in this case, the probability of a reward). The information about history is redundant. This is the consequence of the fact that the probability of a reward does not change in time, i.e. P_N = P_0.
If the likelihood of reward changes in time, we need to take into account history. The probability of a reward depends only on the initial probability and the number of trials, after N-th trial it is P_N = f(P_0, N). The history is composed out of trial outcomes E_i and has length N. While f(P_0, N) is set, for each outcome E_i in history and every possible initial probability P_0 we can compute the probability that P_0 produces the outcome, p_i = P(E_i | f, P_0). These individual probabilities p_i multiplied together give the probability P_H of the initial probability to produce the history of outcomes when function f is applied after every trial, P_H (f, P_0) = \prod_i p_i. The true initial probability is then most likely the one with the highest P_H.
The bandit's initial probability of a reward is most likely the one which yields the history of outcomes with the highest probability.
Restated this way, the law works the same way as before in the case that P_0 does not change, because f can be just f(P_0, N) = P_0. However, it works now even in the case that the probability of a reward changes between trials. The longer is the history of outcomes, the more precise is the estimation.
Example
The example is from the Kaggle competition, where machines have 3% probability decay. A machine is pulled three times. For the first and the third time we get a reward, on the second pull we get nothing. The machine produced history [1, 0, 1]. What initial probability is the most likely to produce this history?
If the initial probability was 1, then after the first pull, the probability of success would be 0.97 and after the second pull it would be 0.9409. The probability that the machine produces history [1, 0, 1] is:
With the initial probability 0, the machine can not produce this history at all.
Now, consider the middle, initial probability 0.5, the probability of producing the history is:
We see that it is more likely that the initial probability of the machine was 0.5 than 1 and the initial probability 0 is completely out of question.
The Kaggle problem is a bit easier than a general case as the initial probability was always integer between 0 and 100 (in percents) and not a float. The function to find out the most likely probability can be easily vectorized, computing the probability of producing the history for all possible initial probabilities at once. Specifically for history [1, 0, 1] we can find out that the most likely initial probability with 3% decay is 0.71 and the probability of producing the history is P_H = 0.1481.
Limitation
When f(P_0, N) is set in such a way that P_N grows or decays, similarly as in the Kaggle case, there is a possibility that the assumed P_0 will not converge to the true P_0. Similarly as in the original law of large numbers, the estimate of P_0 is the more precise, the more observations we have. However, unlike with the original law of large numbers, we can eventually reach a state where no further observations will improve the estimate. An extreme example of this maybe be when f(P_0, N) = \text{max}(0, P_0-1 \cdot N), i.e. after the first pull onwards, the probability of a reward is zero. Whatever the initial probability is, the assumed original probability is dependent solely on the first trial. If a reward is gained, assumed probability will be P_0 = 1, in the other case, it will be P_0 = 0. No further trials can change the assumed P_0.
Cite as:
@article{wagner2021generalization,title = "Generalization of the law of large numbers",author = "Wagner, Jakub",journal = "somnambwl.netlify.app",year = "2021",url = "https://somnambwl.netlify.app/Science/LawOfLargeNumbers/"}
It is possible to try different reward-changing functions and initial probabilities for a single machine using Python code saved in a Kaggle notebook.