Policies.Exp3R module

The Drift-Detection algorithm for non-stationary bandits.

Warning

It works on Exp3 or other parametrizations of the Exp3 policy, e.g., Exp3PlusPlus.

Policies.Exp3R.VERBOSE = False

Whether to be verbose when doing the search for valid parameter \(\ell\).

Policies.Exp3R.CONSTANT_C = 1.0

The constant \(C\) used in Corollary 1 of paper [[“EXP3 with Drift Detection for the Switching Bandit Problem”, Robin Allesiardo & Raphael Feraud]](https://www.researchgate.net/profile/Allesiardo_Robin/publication/281028960_EXP3_with_Drift_Detection_for_the_Switching_Bandit_Problem/links/55d1927808aee19936fdac8e.pdf).

class Policies.Exp3R.DriftDetection_IndexPolicy(nbArms, H=None, delta=None, C=1.0, horizon=None, policy=<class 'Policies.Exp3.Exp3'>, *args, **kwargs)[source]

Bases: Policies.CD_UCB.CD_IndexPolicy

The Drift-Detection generic policy for non-stationary bandits, using a custom Drift-Detection test, for 1-dimensional exponential families.

__init__(nbArms, H=None, delta=None, C=1.0, horizon=None, policy=<class 'Policies.Exp3.Exp3'>, *args, **kwargs)[source]

New policy.

H = None

Parameter \(H\) for the Drift-Detection algorithm. Default value is \(\lceil C \sqrt{T \log(T)} \rceil\), for some constant \(C=\) C (= CONSTANT_C by default).

delta = None

Parameter \(\delta\) for the Drift-Detection algorithm. Default value is \(\sqrt{\frac{\log(T)}{K T}}\) for \(K\) arms and horizon \(T\).

proba_random_exploration

Parameter \(\gamma\) for the Exp3 algorithm.

threshold_h

Parameter \(\varepsilon\) for the Drift-Detection algorithm.

\[\varepsilon = \sqrt{\frac{K \log(\frac{1}{\delta})}{2 \gamma H}}.\]
min_number_of_pulls_to_test_change

Compute \(\Gamma_{\min}(I) := \frac{\gamma H}{K}\), the minimum number of samples we should have for all arms before testing for a change.

__str__()[source]

-> str

detect_change(arm, verbose=False)[source]

Detect a change in the current arm, using a Drift-Detection test (DD).

\[\begin{split}k_{\max} &:= \arg\max_k \tilde{\rho}_k(t),\\ DD_t(k) &= \hat{\mu}_k(I) - \hat{\mu}_{k_{\max}}(I).\end{split}\]
  • The change is detected if there is an arm \(k\) such that \(DD_t(k) \geq 2 * \varepsilon = h\), where threshold_h is the threshold of the test, and \(I\) is the (number of the) current interval since the last (global) restart,
  • where \(\tilde{\rho}_k(t)\) is the trust probability of arm \(k\) from the Exp3 algorithm,
  • and where \(\hat{\mu}_k(I)\) is the empirical mean of arm \(k\) from the data in the current interval.

Warning

FIXME I know this implementation is not (yet) correct… I should count differently the samples we obtained from the Gibbs distribution (when Exp3 uses the trust vector) and from the uniform distribution This \(\Gamma_{\min}(I)\) is the minimum number of samples obtained from the uniform exploration (of probability \(\gamma\)). It seems painful to code correctly, I will do it later.

__module__ = 'Policies.Exp3R'
class Policies.Exp3R.Exp3R(nbArms, policy=<class 'Policies.Exp3.Exp3'>, *args, **kwargs)[source]

Bases: Policies.Exp3R.DriftDetection_IndexPolicy

The Exp3.R policy for non-stationary bandits.

__init__(nbArms, policy=<class 'Policies.Exp3.Exp3'>, *args, **kwargs)[source]

New policy.

__str__()[source]

-> str

__module__ = 'Policies.Exp3R'
class Policies.Exp3R.Exp3RPlusPlus(nbArms, policy=<class 'Policies.Exp3PlusPlus.Exp3PlusPlus'>, *args, **kwargs)[source]

Bases: Policies.Exp3R.DriftDetection_IndexPolicy

The Exp3.R++ policy for non-stationary bandits.

__init__(nbArms, policy=<class 'Policies.Exp3PlusPlus.Exp3PlusPlus'>, *args, **kwargs)[source]

New policy.

__module__ = 'Policies.Exp3R'
__str__()[source]

-> str