Policies.AdBandits module¶

The AdBandits bandit algorithm, mixing Thompson Sampling and BayesUCB.

Reference: [AdBandit: A New Algorithm For Multi-Armed Bandits, F.S.Truzzi, V.F.da Silva, A.H.R.Costa, F.G.Cozman](http://sites.poli.usp.br/p/fabio.cozman/Publications/Article/truzzi-silva-costa-cozman-eniac2013.pdf)
Code inspired from: https://github.com/flaviotruzzi/AdBandits/

Warning

This policy is very not famous, but for stochastic bandits it works usually VERY WELL! It is not anytime thought.

Policies.AdBandits.ALPHA = 1¶: Default value for the parameter \(\alpha\) for the AdBandits class.

class Policies.AdBandits.AdBandits(nbArms, horizon=1000, alpha=1, posterior=<class 'Policies.Posterior.Beta.Beta'>, lower=0.0, amplitude=1.0)[source]¶

Bases: Policies.BasePolicy.BasePolicy

The AdBandits bandit algorithm, mixing Thompson Sampling and BayesUCB.

Reference: [AdBandit: A New Algorithm For Multi-Armed Bandits, F.S.Truzzi, V.F.da Silva, A.H.R.Costa, F.G.Cozman](http://sites.poli.usp.br/p/fabio.cozman/Publications/Article/truzzi-silva-costa-cozman-eniac2013.pdf)
Code inspired from: https://github.com/flaviotruzzi/AdBandits/

Warning

This policy is very not famous, but for stochastic bandits it works usually VERY WELL! It is not anytime thought.

__init__(nbArms, horizon=1000, alpha=1, posterior=<class 'Policies.Posterior.Beta.Beta'>, lower=0.0, amplitude=1.0)[source]¶: New policy.

alpha = None¶: Parameter alpha

horizon = None¶: Parameter \(T\) = known horizon of the experiment. Default value is 1000.

posterior = None¶: Posterior for each arm. List instead of dict, quicker access

__str__()[source]¶: -> str

startGame()[source]¶: Reset each posterior.

getReward(arm, reward)[source]¶: Store the reward, and update the posterior for that arm.

epsilon¶: Time variating parameter \(\varepsilon(t)\).

choice()[source]¶: With probability \(1 - \varepsilon(t)\), use a Thompson Sampling step, otherwise use a UCB-Bayes step, to choose one arm.

choiceWithRank(rank=1)[source]¶: With probability \(1 - \varepsilon(t)\), use a Thompson Sampling step, otherwise use a UCB-Bayes step, to choose one arm of a certain rank.

__module__ = 'Policies.AdBandits'¶

Policies.AdBandits.random() → x in the interval [0, 1).¶