Policies.MOSSExperimental module¶

The MOSS-Experimental policy for bounded bandits, without knowing the horizon (and no doubling trick). Reference: [Degenne & Perchet, 2016](http://proceedings.mlr.press/v48/degenne16.pdf).

Warning

Nothing was proved for this heuristic!

class Policies.MOSSExperimental.MOSSExperimental(nbArms, lower=0.0, amplitude=1.0)[source]¶

Bases: Policies.MOSS.MOSS

The MOSS-Experimental policy for bounded bandits, without knowing the horizon (and no doubling trick). Reference: [Degenne & Perchet, 2016](http://proceedings.mlr.press/v48/degenne16.pdf).

__str__()[source]¶: -> str

computeIndex(arm)[source]¶: Compute the current index, at time t and after \(N_k(t)\) pulls of arm k, if there is K arms:

\[\begin{split}I_k(t) &= \frac{X_k(t)}{N_k(t)} + \sqrt{ \max\left(0, \frac{\log\left(\frac{t}{\hat{H}(t)}\right)}{N_k(t)}\right)},\\ \text{where}\;\; \hat{H}(t) &:= \begin{cases} \sum\limits_{j=1, N_j(t) < \sqrt{t}}^{K} N_j(t) & \;\text{if it is}\; > 0,\\ K N_k(t) & \;\text{otherwise}\; \end{cases}\end{split}\]

Note

In the article, the authors do not explain this subtlety, and I don’t see an argument to justify that at anytime, \(\hat{H}(t) > 0\) ie to justify that there is always some arms \(j\) such that \(0 < N_j(t) < \sqrt{t}\).

computeAllIndex()[source]¶: Compute the current indexes for all arms, in a vectorized manner.

__module__ = 'Policies.MOSSExperimental'¶