Policies.BoltzmannGumbel module

The Boltzmann-Gumbel Exploration (BGE) index policy, a different formulation of the Exp3 policy with an optimally tune decreasing sequence of temperature parameters \(\gamma_t\).

  • Reference: Section 4 of [Boltzmann Exploration Done Right, N.Cesa-Bianchi & C.Gentile & G.Lugosi & G.Neu, arXiv 2017](https://arxiv.org/pdf/1705.10257.pdf).
  • It is an index policy with indexes computed from the empirical mean estimators and a random sample from a Gumbel distribution.
Policies.BoltzmannGumbel.SIGMA = 1

Default constant \(\sigma\) assuming the arm distributions are \(\sigma^2\)-subgaussian. 1 for Bernoulli arms.

class Policies.BoltzmannGumbel.BoltzmannGumbel(nbArms, C=1, lower=0.0, amplitude=1.0)[source]

Bases: Policies.IndexPolicy.IndexPolicy

The Boltzmann-Gumbel Exploration (BGE) index policy, a different formulation of the Exp3 policy with an optimally tune decreasing sequence of temperature parameters \(\gamma_t\).

  • Reference: Section 4 of [Boltzmann Exploration Done Right, N.Cesa-Bianchi & C.Gentile & G.Lugosi & G.Neu, arXiv 2017](https://arxiv.org/pdf/1705.10257.pdf).
  • It is an index policy with indexes computed from the empirical mean estimators and a random sample from a Gumbel distribution.
__init__(nbArms, C=1, lower=0.0, amplitude=1.0)[source]

New generic index policy.

  • nbArms: the number of arms,
  • lower, amplitude: lower value and known amplitude of the rewards.
__str__()[source]

-> str

computeIndex(arm)[source]

Take a random index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}I_k(t) &= \frac{X_k(t)}{N_k(t)} + \beta_k(t) Z_k(t), \\ \text{where}\;\; \beta_k(t) &:= \sqrt{C^2 / N_k(t)}, \\ \text{and}\;\; Z_k(t) &\sim \mathrm{Gumbel}(0, 1).\end{split}\]

Where \(\mathrm{Gumbel}(0, 1)\) is the standard Gumbel distribution. See [Numpy documentation](https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.gumbel.html#numpy.random.gumbel) or [Wikipedia page](https://en.wikipedia.org/wiki/Gumbel_distribution) for more details.

computeAllIndex()[source]

Compute the current indexes for all arms, in a vectorized manner.

__module__ = 'Policies.BoltzmannGumbel'