Policies.BayesUCB module

The Bayes-UCB policy.

  • By default, it uses a Beta posterior (Policies.Posterior.Beta), one by arm.
  • Reference: [Kaufmann, Cappé & Garivier - AISTATS, 2012]
class Policies.BayesUCB.BayesUCB(nbArms, posterior=<class 'Policies.Posterior.Beta.Beta'>, lower=0.0, amplitude=1.0, *args, **kwargs)[source]

Bases: Policies.BayesianIndexPolicy.BayesianIndexPolicy

The Bayes-UCB policy.

-Reference: [Kaufmann, Cappé & Garivier - AISTATS, 2012].

computeIndex(arm)[source]

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k, giving \(S_k(t)\) rewards of 1, by taking the \(1 - \frac{1}{t}\) quantile from the Beta posterior:

\[I_k(t) = \mathrm{Quantile}\left(\mathrm{Beta}(1 + S_k(t), 1 + N_k(t) - S_k(t)), 1 - \frac{1}{t}\right).\]
__module__ = 'Policies.BayesUCB'