Policies.UCBH module¶

The UCB-H policy for bounded bandits, with knowing the horizon. Reference: [Audibert et al. 09].

class Policies.UCBH.UCBH(nbArms, horizon=None, alpha=4, lower=0.0, amplitude=1.0)[source]¶

The UCB-H policy for bounded bandits, with knowing the horizon. Reference: [Audibert et al. 09].

__init__(nbArms, horizon=None, alpha=4, lower=0.0, amplitude=1.0)[source]¶

New generic index policy.

computeIndex(arm)[source]¶: Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[I_k(t) = \frac{X_k(t)}{N_k(t)} + \sqrt{\frac{\alpha \log(T)}{2 N_k(t)}}.\]

computeAllIndex()[source]¶: Compute the current indexes for all arms, in a vectorized manner.