Policies.ProbabilityPursuit module

The basic Probability Pursuit algorithm.

  • We use the simple version of the pursuit algorithm, as described in the seminal book by Sutton and Barto (1998), https://webdocs.cs.ualberta.ca/~sutton/book/the-book.html.

  • Initially, a uniform probability is set on each arm, \(p_k(0) = 1/k\).

  • At each time step \(t\), the probabilities are all recomputed, following this equation:

    \[\begin{split}p_k(t+1) = \begin{cases} (1 - \beta) p_k(t) + \beta \times 1 & \text{if}\; \hat{\mu}_k(t) = \max_j \hat{\mu}_j(t) \\ (1 - \beta) p_k(t) + \beta \times 0 & \text{otherwise}. \end{cases}\end{split}\]
  • \(\beta \in (0, 1)\) is a learning rate, default is BETA = 0.5.

  • And then arm \(A_k(t+1)\) is randomly selected from the distribution \((p_k(t+1))_{1 \leq k \leq K}\).

  • References: [Kuleshov & Precup - JMLR, 2000](http://www.cs.mcgill.ca/~vkules/bandits.pdf#page=6), [Sutton & Barto, 1998]

Policies.ProbabilityPursuit.BETA = 0.5

Default value for the beta parameter

class Policies.ProbabilityPursuit.ProbabilityPursuit(nbArms, beta=0.5, prior='uniform', lower=0.0, amplitude=1.0)[source]

Bases: Policies.BasePolicy.BasePolicy

The basic Probability pursuit algorithm.

__init__(nbArms, beta=0.5, prior='uniform', lower=0.0, amplitude=1.0)[source]

New policy.

probabilities = None

Probabilities of each arm

startGame()[source]

Reinitialize probabilities.

beta

Constant parameter \(\beta(t) = \beta(0)\).

__str__()[source]

-> str

getReward(arm, reward)[source]

Give a reward: accumulate rewards on that arm k, then update the probabilities \(p_k(t)\) of each arm.

choice()[source]

One random selection, with probabilities \((p_k(t))_{1 \leq k \leq K}\), thank to numpy.random.choice().

choiceWithRank(rank=1)[source]

Multiple (rank >= 1) random selection, with probabilities \((p_k(t))_{1 \leq k \leq K}\), thank to numpy.random.choice(), and select the last one (less probable).

choiceFromSubSet(availableArms='all')[source]

One random selection, from availableArms, with probabilities \((p_k(t))_{1 \leq k \leq K}\), thank to numpy.random.choice().

__module__ = 'Policies.ProbabilityPursuit'
choiceMultiple(nb=1)[source]

Multiple (nb >= 1) random selection, with probabilities \((p_k(t))_{1 \leq k \leq K}\), thank to numpy.random.choice().