Policies.MOSSH module¶

The MOSS-H policy for bounded bandits, with knowing the horizon. Reference: [Audibert & Bubeck, 2010](http://www.jmlr.org/papers/volume11/audibert10a/audibert10a.pdf).

class Policies.MOSSH.MOSSH(nbArms, horizon=None, lower=0.0, amplitude=1.0)[source]¶

The MOSS-H policy for bounded bandits, with knowing the horizon. Reference: [Audibert & Bubeck, 2010](http://www.jmlr.org/papers/volume11/audibert10a/audibert10a.pdf).

__init__(nbArms, horizon=None, lower=0.0, amplitude=1.0)[source]¶

New generic index policy.

computeIndex(arm)[source]¶: Compute the current index, at time t and after \(N_k(t)\) pulls of arm k, if there is K arms:

\[I_k(t) = \frac{X_k(t)}{N_k(t)} + \sqrt{\max\left(0, \frac{\log\left(\frac{T}{K N_k(t)}\right)}{N_k(t)}\right)}.\]

computeAllIndex()[source]¶: Compute the current indexes for all arms, in a vectorized manner.