Policies.UCBmin module

The UCB-min policy for bounded bandits, with a \(\min\left(1, \sqrt{\frac{\log(t)}{2 N_k(t)}}\right)\) term. Reference: [Anandkumar et al., 2010].

class Policies.UCBmin.UCBmin(nbArms, lower=0.0, amplitude=1.0)[source]

Bases: Policies.UCB.UCB

The UCB-min policy for bounded bandits, with a \(\min\left(1, \sqrt{\frac{\log(t)}{2 N_k(t)}}\right)\) term. Reference: [Anandkumar et al., 2010].

computeIndex(arm)[source]

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[I_k(t) = \frac{X_k(t)}{N_k(t)} + \min\left(1, \sqrt{\frac{\log(t)}{2 N_k(t)}}\right).\]
computeAllIndex()[source]

Compute the current indexes for all arms, in a vectorized manner.

__module__ = 'Policies.UCBmin'