Policies.Experimentals.KLempUCB module

The Empirical KL-UCB algorithm non-parametric policy. Reference: [Maillard, Munos & Stoltz - COLT, 2011], [Cappé, Garivier, Maillard, Munos & Stoltz, 2012].

class Policies.Experimentals.KLempUCB.KLempUCB(nbArms, maxReward=1.0, lower=0.0, amplitude=1.0)[source]

Bases: IndexPolicy.IndexPolicy

The Empirical KL-UCB algorithm non-parametric policy. References: [Maillard, Munos & Stoltz - COLT, 2011], [Cappé, Garivier, Maillard, Munos & Stoltz, 2012].

__init__(nbArms, maxReward=1.0, lower=0.0, amplitude=1.0)[source]

New generic index policy.

  • nbArms: the number of arms,
  • lower, amplitude: lower value and known amplitude of the rewards.
c = None

Parameter c

maxReward = None

Known upper bound on the rewards

pulls = None

Keep track of pulls of each arm

obs = None

UNBOUNDED dictionnary for each arm: keep track of how many observation of each rewards were seen. Warning: KLempUCB works better for discrete distributions!

startGame()[source]

Initialize the policy for a new game.

computeIndex(arm)[source]

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k.

getReward(arm, reward)[source]

Give a reward: increase t, pulls, and update count of observations for that arm.

static _KLucb(obs, klMax, debug=False)[source]

Optimization method.

__module__ = 'Policies.Experimentals.KLempUCB'