Policies.Experimentals.UCBwrong module¶

The UCBwrong policy for bounded bandits, like UCB but with a typo on the estimator of means: \(\frac{X_k(t)}{t}\) is used instead of \(\frac{X_k(t)}{N_k(t)}\).

One paper of W.Jouini, C.Moy and J.Palicot from 2009 contained this typo, I reimplemented it just to check that:

its performance is worse than simple UCB,
but not that bad…

class Policies.Experimentals.UCBwrong.UCBwrong(nbArms, lower=0.0, amplitude=1.0)[source]¶

Bases: IndexPolicy.IndexPolicy

The UCBwrong policy for bounded bandits, like UCB but with a typo on the estimator of means.

One paper of W.Jouini, C.Moy and J.Palicot from 2009 contained this typo, I reimplemented it just to check that:

its performance is worse than simple UCB
but not that bad…

computeIndex(arm)[source]¶: Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[I_k(t) = \frac{X_k(t)}{t} + \sqrt{\frac{2 \log(t)}{N_k(t)}}.\]

computeAllIndex()[source]¶: Compute the current indexes for all arms, in a vectorized manner.

__module__ = 'Policies.Experimentals.UCBwrong'¶