Policies.UCBplus module¶

The UCB+ policy for bounded bandits, with a small trick on the index.

Reference: [Auer et al. 2002], and [[Garivier et al. 2016](https://arxiv.org/pdf/1605.08988.pdf)] (it is noted \(\mathrm{UCB}^*\) in the second article).

class Policies.UCBplus.UCBplus(nbArms, lower=0.0, amplitude=1.0)[source]¶

Bases: Policies.UCB.UCB

The UCB+ policy for bounded bandits, with a small trick on the index.

Reference: [Auer et al. 2002], and [[Garivier et al. 2016](https://arxiv.org/pdf/1605.08988.pdf)] (it is noted \(\mathrm{UCB}^*\) in the second article).

__str__()[source]¶: -> str

computeIndex(arm)[source]¶: Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[I_k(t) = \frac{X_k(t)}{N_k(t)} + \sqrt{\max\left(0, \frac{\log(t / N_k(t))}{2 N_k(t)}\right)}.\]

computeAllIndex()[source]¶: Compute the current indexes for all arms, in a vectorized manner.

__module__ = 'Policies.UCBplus'¶