Policies.IMED module

The IMED policy of [Honda & Takemura, JMLR 2015].

Policies.IMED.Dinf(x=None, mu=None, kl=<function klBern>, lowerbound=0, upperbound=1, precision=1e-06, max_iterations=50)[source]

The generic Dinf index computation.

  • x: value of the cum reward,
  • mu: upperbound on the mean y,
  • kl: the KL divergence to be used (klBern(), klGauss(), etc),
  • lowerbound, upperbound=1: the known bound of the values y and x,
  • precision=1e-6: the threshold from where to stop the research,
  • max_iterations: max number of iterations of the loop (safer to bound it to reduce time complexity).
\[D_{\inf}(x, d) \simeq \inf_{\max(\mu, \mathrm{lowerbound}) \leq y \leq \mathrm{upperbound}} \mathrm{kl}(x, y).\]

Note

It uses a call the scipy.optimize.minimize_scalar(). If this fails, it uses a bisection search, and one call to kl for each step of the bisection search.

class Policies.IMED.IMED(nbArms, tolerance=0.0001, kl=<function klBern>, lower=0.0, amplitude=1.0)[source]

Bases: Policies.DMED.DMED

The IMED policy of [Honda & Takemura, JMLR 2015].

__init__(nbArms, tolerance=0.0001, kl=<function klBern>, lower=0.0, amplitude=1.0)[source]

New policy.

__str__()[source]

-> str

one_Dinf(x, mu)[source]

Compute the \(D_{\inf}\) solution, for one value of x, and one value for mu.

Dinf(xs, mu)[source]

Compute the \(D_{\inf}\) solution, for a vector of value of xs, and one value for mu.

choice()[source]

Choose an arm with minimal index (uniformly at random):

\[A(t) \sim U(\arg\min_{1 \leq k \leq K} I_k(t)).\]

Where the indexes are:

\[I_k(t) = N_k(t) D_{\inf}(\hat{\mu_{k}}(t), \max_{k'} \hat{\mu_{k'}}(t)) + \log(N_k(t)).\]
__module__ = 'Policies.IMED'