Policies.DMED module

The DMED policy of [Honda & Takemura, COLT 2010] in the special case of Bernoulli rewards (can be used on any [0,1]-valued rewards, but warning: in the non-binary case, this is not the algorithm of [Honda & Takemura, COLT 2010]) (see note below on the variant).

class Policies.DMED.DMED(nbArms, genuine=False, tolerance=0.0001, kl=<function klBern>, lower=0.0, amplitude=1.0)[source]

Bases: Policies.BasePolicy.BasePolicy

The DMED policy of [Honda & Takemura, COLT 2010] in the special case of Bernoulli rewards (can be used on any [0,1]-valued rewards, but warning: in the non-binary case, this is not the algorithm of [Honda & Takemura, COLT 2010]) (see note below on the variant).

__init__(nbArms, genuine=False, tolerance=0.0001, kl=<function klBern>, lower=0.0, amplitude=1.0)[source]

New policy.

kl = None

kl function to use

tolerance = None

Numerical tolerance

genuine = None

Flag to know which variant is implemented, DMED or DMED+

nextActions = None

List of next actions to play, every next step is playing nextActions.pop(0)

__str__()[source]

-> str

startGame()[source]

Initialize the policy for a new game.

choice()[source]

If there is still a next action to play, pop it and play it, otherwise make new list and play first action.

The list of action is obtained as all the indexes \(k\) satisfying the following equation.

  • For the naive version (genuine = False), DMED:
\[\mathrm{kl}(\hat{\mu}_k(t), \hat{\mu}^*(t)) < \frac{\log(t)}{N_k(t)}.\]
  • For the original version (genuine = True), DMED+:
\[\mathrm{kl}(\hat{\mu}_k(t), \hat{\mu}^*(t)) < \frac{\log(\frac{t}{N_k(t)})}{N_k(t)}.\]

Where \(X_k(t)\) is the sum of rewards from arm k, \(\hat{\mu}_k(t)\) is the empirical mean, and \(\hat{\mu}^*(t)\) is the best empirical mean.

\[\begin{split}X_k(t) &= \sum_{\sigma=1}^{t} 1(A(\sigma) = k) r_k(\sigma) \\ \hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ \hat{\mu}^*(t) &= \max_{k=1}^{K} \hat{\mu}_k(t)\end{split}\]
choiceMultiple(nb=1)[source]

If there is still enough actions to play, pop them and play them, otherwise make new list and play nb first actions.

__module__ = 'Policies.DMED'
class Policies.DMED.DMEDPlus(nbArms, tolerance=0.0001, kl=<function klBern>, lower=0.0, amplitude=1.0)[source]

Bases: Policies.DMED.DMED

The DMED+ policy of [Honda & Takemura, COLT 2010] in the special case of Bernoulli rewards (can be used on any [0,1]-valued rewards, but warning: in the non-binary case, this is not the algorithm of [Honda & Takemura, COLT 2010]).

__init__(nbArms, tolerance=0.0001, kl=<function klBern>, lower=0.0, amplitude=1.0)[source]

New policy.

__module__ = 'Policies.DMED'