Policies.UCBplus module

The UCB+ policy for bounded bandits, with a small trick on the index.

class Policies.UCBplus.UCBplus(nbArms, lower=0.0, amplitude=1.0)[source]

Bases: Policies.UCB.UCB

The UCB+ policy for bounded bandits, with a small trick on the index.

__str__()[source]

-> str

computeIndex(arm)[source]

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[I_k(t) = \frac{X_k(t)}{N_k(t)} + \sqrt{\max\left(0, \frac{\log(t / N_k(t))}{2 N_k(t)}\right)}.\]
computeAllIndex()[source]

Compute the current indexes for all arms, in a vectorized manner.

__module__ = 'Policies.UCBplus'