Policies.Experimentals.KLempUCB module¶

The Empirical KL-UCB algorithm non-parametric policy. Reference: [Maillard, Munos & Stoltz - COLT, 2011], [Cappé, Garivier, Maillard, Munos & Stoltz, 2012].

class Policies.Experimentals.KLempUCB.KLempUCB(nbArms, maxReward=1.0, lower=0.0, amplitude=1.0)[source]¶

Bases: IndexPolicy.IndexPolicy

The Empirical KL-UCB algorithm non-parametric policy. References: [Maillard, Munos & Stoltz - COLT, 2011], [Cappé, Garivier, Maillard, Munos & Stoltz, 2012].

__init__(nbArms, maxReward=1.0, lower=0.0, amplitude=1.0)[source]¶

New generic index policy.

nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.

c = None¶: Parameter c

maxReward = None¶: Known upper bound on the rewards

pulls = None¶: Keep track of pulls of each arm

obs = None¶: UNBOUNDED dictionnary for each arm: keep track of how many observation of each rewards were seen. Warning: KLempUCB works better for discrete distributions!

startGame()[source]¶: Initialize the policy for a new game.

computeIndex(arm)[source]¶: Compute the current index, at time t and after \(N_k(t)\) pulls of arm k.

getReward(arm, reward)[source]¶: Give a reward: increase t, pulls, and update count of observations for that arm.

static _KLucb(obs, klMax, debug=False)[source]¶: Optimization method.

__module__ = 'Policies.Experimentals.KLempUCB'¶