Policies.RandomizedIndexPolicy module

Generic randomized index policy.

  • Reference: [[“On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems”, by Baekjin Kim, Ambuj Tewari, arXiv:1902.00610]](https://arxiv.org/pdf/1902.00610.pdf)
Policies.RandomizedIndexPolicy.VERBOSE = False

True to debug information about the perturbations

Policies.RandomizedIndexPolicy.uniform_perturbation(size=1, low=-1.0, high=1.0)[source]

Uniform random perturbation, not from \([0, 1]\) but from \([-1, 1]\), that is \(\mathcal{U}niform([-1, 1])\).

  • Reference: see Corollary 6 from [[“On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems”, by Baekjin Kim, Ambuj Tewari, arXiv:1902.00610]](https://arxiv.org/pdf/1902.00610.pdf)
Policies.RandomizedIndexPolicy.normal_perturbation(size=1, loc=0.0, scale=0.25)[source]

Normal (Gaussian) random perturbation, with mean loc=0 and scale (sigma2) scale=0.25 (by default), that is \(\mathcal{N}ormal(loc, scale)\).

  • Reference: see Corollary 6 from [[“On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems”, by Baekjin Kim, Ambuj Tewari, arXiv:1902.00610]](https://arxiv.org/pdf/1902.00610.pdf)
Policies.RandomizedIndexPolicy.gaussian_perturbation(size=1, loc=0.0, scale=0.25)

Normal (Gaussian) random perturbation, with mean loc=0 and scale (sigma2) scale=0.25 (by default), that is \(\mathcal{N}ormal(loc, scale)\).

  • Reference: see Corollary 6 from [[“On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems”, by Baekjin Kim, Ambuj Tewari, arXiv:1902.00610]](https://arxiv.org/pdf/1902.00610.pdf)
Policies.RandomizedIndexPolicy.exponential_perturbation(size=1, scale=0.25)[source]

Exponential random perturbation, with parameter (\(\lambda\)) scale=0.25 (by default), that is \(\mathcal{E}xponential(\lambda)\).

  • Reference: see Corollary 7 from [[“On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems”, by Baekjin Kim, Ambuj Tewari, arXiv:1902.00610]](https://arxiv.org/pdf/1902.00610.pdf)
Policies.RandomizedIndexPolicy.gumbel_perturbation(size=1, loc=0.0, scale=0.25)[source]

Gumbel random perturbation, with mean loc=0 and scale scale=0.25 (by default), that is \(\mathcal{G}umbel(loc, scale)\).

  • Reference: see Corollary 7 from [[“On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems”, by Baekjin Kim, Ambuj Tewari, arXiv:1902.00610]](https://arxiv.org/pdf/1902.00610.pdf)
Policies.RandomizedIndexPolicy.map_perturbation_str_to_function = {'exponential': <function exponential_perturbation>, 'gaussian': <function normal_perturbation>, 'gumbel': <function gumbel_perturbation>, 'normal': <function normal_perturbation>, 'uniform': <function uniform_perturbation>}

Map perturbation names (like "uniform") to perturbation functions (like uniform_perturbation()).

class Policies.RandomizedIndexPolicy.RandomizedIndexPolicy(nbArms, perturbation='uniform', lower=0.0, amplitude=1.0, *args, **kwargs)[source]

Bases: Policies.IndexPolicy.IndexPolicy

Class that implements a generic randomized index policy.

__init__(nbArms, perturbation='uniform', lower=0.0, amplitude=1.0, *args, **kwargs)[source]

New generic index policy.

  • nbArms: the number of arms,
  • perturbation: [“uniform”, “normal”, “exponential”, “gaussian”] or a function like numpy.random.uniform(),
  • lower, amplitude: lower value and known amplitude of the rewards.
perturbation_name = None

Name of the function to generate the random perturbation.

perturbation = None

Function to generate the random perturbation.

__str__()[source]

-> str

computeIndex(arm)[source]

In a randomized index policy, with distribution \(\mathrm{Distribution}\) generating perturbations \(Z_k(t)\), with index \(I_k(t)\) and mean \(\hat{\mu}_k(t)\) for each arm \(k\), it chooses an arm with maximal perturbated index (uniformly at random):

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ Z_k(t) &\sim \mathrm{Distribution}, \\ \mathrm{UCB}_k(t) &= I_k(t) - \hat{\mu}_k(t),\\ A(t) &\sim U(\arg\max_{1 \leq k \leq K} \hat{\mu}_k(t) + \mathrm{UCB}_k(t) \cdot Z_k(t)).\end{split}\]
__module__ = 'Policies.RandomizedIndexPolicy'
computeAllIndex()[source]

In a randomized index policy, with distribution \(\mathrm{Distribution}\) generating perturbations \(Z_k(t)\), with index \(I_k(t)\) and mean \(\hat{\mu}_k(t)\) for each arm \(k\), it chooses an arm with maximal perturbated index (uniformly at random):

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ Z_k(t) &\sim \mathrm{Distribution}, \\ \mathrm{UCB}_k(t) &= I_k(t) - \hat{\mu}_k(t),\\ A(t) &\sim U(\arg\max_{1 \leq k \leq K} \hat{\mu}_k(t) + \mathrm{UCB}_k(t) \cdot Z_k(t)).\end{split}\]