Policies.Experimentals.ThompsonRobust module¶

The Thompson (Bayesian) index policy, using an average of 20 index. By default, it uses a Beta posterior. Reference: [Thompson - Biometrika, 1933].

Policies.Experimentals.ThompsonRobust.AVERAGEON = 10¶: Default value of how many indexes are computed by sampling the posterior for the ThompsonRobust variant.

class Policies.Experimentals.ThompsonRobust.ThompsonRobust(nbArms, posterior=<class 'Posterior.Beta.Beta'>, averageOn=10, lower=0.0, amplitude=1.0)[source]¶

Bases: Thompson.Thompson

The Thompson (Bayesian) index policy, using an average of 20 index. By default, it uses a Beta posterior. Reference: [Thompson - Biometrika, 1933].

__init__(nbArms, posterior=<class 'Posterior.Beta.Beta'>, averageOn=10, lower=0.0, amplitude=1.0)[source]¶: Create a new Bayesian policy, by creating a default posterior on each arm.

averageOn = None¶: How many indexes are computed before averaging

__str__()[source]¶: -> str

computeIndex(arm)[source]¶

Compute the current index for this arm, by sampling averageOn times the posterior and returning the average index.

At time t and after \(N_k(t)\) pulls of arm k, giving \(S_k(t)\) rewards of 1, by sampling from the Beta posterior and averaging:

\[\begin{split}I_k(t) &= \frac{1}{\mathrm{averageOn}} \sum_{i=1}^{\mathrm{averageOn}} I_k^{(i)}(t), \\ I_k^{(i)}(t) &\sim \mathrm{Beta}(1 + S_k(t), 1 + N_k(t) - S_k(t)).\end{split}\]

__module__ = 'Policies.Experimentals.ThompsonRobust'¶