Policies.MOSSAnytime module¶

The MOSS-Anytime policy for bounded bandits, without knowing the horizon (and no doubling trick). Reference: [Degenne & Perchet, 2016](http://proceedings.mlr.press/v48/degenne16.pdf).

Policies.MOSSAnytime.ALPHA = 1.0¶: Default value for the parameter \(\alpha\) for the MOSS-Anytime algorithm.

class Policies.MOSSAnytime.MOSSAnytime(nbArms, alpha=1.0, lower=0.0, amplitude=1.0)[source]¶

The MOSS-Anytime policy for bounded bandits, without knowing the horizon (and no doubling trick). Reference: [Degenne & Perchet, 2016](http://proceedings.mlr.press/v48/degenne16.pdf).

__init__(nbArms, alpha=1.0, lower=0.0, amplitude=1.0)[source]¶

New generic index policy.

alpha = None¶: Parameter \(\alpha \geq 0\) for the computations of the index. Optimal value seems to be \(1.35\).

computeIndex(arm)[source]¶: Compute the current index, at time t and after \(N_k(t)\) pulls of arm k, if there is K arms:

\[I_k(t) = \frac{X_k(t)}{N_k(t)} + \sqrt{\left(\frac{1+\alpha}{2}\right) \max\left(0, \frac{\log\left(\frac{t}{K N_k(t)}\right)}{N_k(t)}\right)}.\]

computeAllIndex()[source]¶: Compute the current indexes for all arms, in a vectorized manner.