Multi-Armed Bandit

This list generator models subject allocation as a Bayesian Bernoulli multi-armed bandit problem. This method balances learning how different subjects arms perform with allocating subjects to the optimal arm.

This method assumes that the probability of a new subject a time \(t\) being allocated to a study arm \(k\), \(p_{k,t}\), has a beta prior with parameters \(\alpha = s_{k,0}\) and \(\beta = f_{k,0}\). By default, the values for \(s_{k,0}\) and \(f_{k,0}\) are 0.5 corresponding to the Jeffreys prior for a bernoulli random variable. The posterior distribution of \(p_{k,t}\) is given by:

\[ \pi(p_{k, t}|\mathbf{x}_{k, t}) \sim \text{Beta}(s_{k,0} + s_{k,t}, f_{k,0} + f_{k,t}) \]

where \(s_{k,t}\) and \(f_{k,t}\) are the number of success and failures, respectively, at time \(t\) and \(\mathbf{x}_{k, t}\) are the observations seen to time \(t\).

Allocation to a specific arm can be done using either current belief or the upper confidence bound index. Current belief allocates new subjects to the arm with the highest posterior probability of success. The upper confidence bound index allocates subjects to the arm with the highest value of an index that takes into account the posterior mean and its variability.

Prior Parameters

Allocation Method

Sign Up to Begin List!


  • Villar, S., Bowden, J. & Wason, J. (2015), "Multi-armed Bandit Models for the Optimal Design of Clincial Trials: Benefits and Challenges," Statistical Science, 30, 199-215.