PoliciesMultiPlayers.ALOHA module

ALOHA: generalized implementation of the single-player policy from [Concurrent bandits and cognitive radio network, O.Avner & S.Mannor, 2014](https://arxiv.org/abs/1404.5421), for a generic single-player policy.

This policy uses the collision avoidance mechanism that is inspired by the classical ALOHA protocol, and any single-player policy.

PoliciesMultiPlayers.ALOHA.tnext_beta(t, beta=0.5)[source]

Simple function, as used in MEGA: upper_tnext(t) = \(t^{\beta}\). Default to \(t^{0.5}\).

>>> tnext_beta(100, beta=0.1)  # doctest: +ELLIPSIS
1.584...
>>> tnext_beta(100, beta=0.5)
10.0
>>> tnext_beta(100, beta=0.9)  # doctest: +ELLIPSIS
63.095...
>>> tnext_beta(1000)  # doctest: +ELLIPSIS
31.622...
PoliciesMultiPlayers.ALOHA.make_tnext_beta(beta=0.5)[source]

Returns the function \(t \mapsto t^{\beta}\).

>>> tnext = make_tnext_beta(0.5)
>>> tnext(100)
10.0
>>> tnext(1000)  # doctest: +ELLIPSIS
31.622...
PoliciesMultiPlayers.ALOHA.tnext_log(t, scaling=1.0)[source]

Other function, not the one used in MEGA, but our proposal: upper_tnext(t) = \(\text{scaling} * \log(1 + t)\).

>>> tnext_log(100, scaling=1)  # doctest: +ELLIPSIS
4.615...
>>> tnext_log(100, scaling=10)  # doctest: +ELLIPSIS
46.151...
>>> tnext_log(100, scaling=100)  # doctest: +ELLIPSIS
461.512...
>>> tnext_log(1000)  # doctest: +ELLIPSIS
6.908...
PoliciesMultiPlayers.ALOHA.make_tnext_log_scaling(scaling=1.0)[source]

Returns the function \(t \mapsto \text{scaling} * \log(1 + t)\).

>>> tnext = make_tnext_log_scaling(1)
>>> tnext(100)  # doctest: +ELLIPSIS
4.615...
>>> tnext(1000)  # doctest: +ELLIPSIS
6.908...
class PoliciesMultiPlayers.ALOHA.oneALOHA(nbPlayers, mother, playerId, nbArms, p0=0.5, alpha_p0=0.5, ftnext=<function tnext_beta>, beta=None)[source]

Bases: PoliciesMultiPlayers.ChildPointer.ChildPointer

Class that acts as a child policy, but in fact it pass all its method calls to the mother class, who passes it to its i-th player.

  • Except for the handleCollision method: the ALOHA collision avoidance protocol is implemented here.
__init__(nbPlayers, mother, playerId, nbArms, p0=0.5, alpha_p0=0.5, ftnext=<function tnext_beta>, beta=None)[source]

Initialize self. See help(type(self)) for accurate signature.

nbPlayers = None

Number of players

p0 = None

Initial probability, should not be modified

p = None

Current probability, can be modified

alpha_p0 = None

Parameter alpha for the recurrence equation for probability p(t)

beta = None

Parameter beta

tnext = None

Only store the delta time

t = None

Internal time

chosenArm = None

Last chosen arm

__str__()[source]

Return str(self).

startGame()[source]

Start game.

ftnext(t)[source]

Time until the arm is removed from list of unavailable arms.

getReward(arm, reward)[source]

Receive a reward on arm of index ‘arm’, as described by the ALOHA protocol.

  • If not collision, receive a reward after pulling the arm.
handleCollision(arm, reward=None)[source]

Handle a collision, on arm of index ‘arm’.

Warning

This method has to be implemented in the collision model, it is NOT implemented in the EvaluatorMultiPlayers.

Note

We do not care on which arm the collision occured.

choice()[source]

Identify the available arms, and use the underlying single-player policy (UCB, Thompson etc) to choose an arm from this sub-set of arms.

__module__ = 'PoliciesMultiPlayers.ALOHA'
class PoliciesMultiPlayers.ALOHA.ALOHA(nbPlayers, nbArms, playerAlgo, p0=0.5, alpha_p0=0.5, ftnext=<function tnext_beta>, beta=None, *args, **kwargs)[source]

Bases: PoliciesMultiPlayers.BaseMPPolicy.BaseMPPolicy

ALOHA: implementation of the multi-player policy from [Concurrent bandits and cognitive radio network, O.Avner & S.Mannor, 2014](https://arxiv.org/abs/1404.5421), for a generic single-player policy.

__init__(nbPlayers, nbArms, playerAlgo, p0=0.5, alpha_p0=0.5, ftnext=<function tnext_beta>, beta=None, *args, **kwargs)[source]
  • nbPlayers: number of players to create (in self._players).
  • playerAlgo: class to use for every players.
  • nbArms: number of arms, given as first argument to playerAlgo.
  • p0: initial probability p(0); p(t) is the probability of persistance on the chosenArm at time t
  • alpha_p0: scaling in the update for p[t+1] <- alpha_p0 p[t] + (1 - alpha_p0)
  • ftnext: general function, default to t -> t^beta, to know from where to sample a random time t_next(k), until when the chosenArm is unavailable. t -> log(1 + t) is also possible.
  • (optional) beta: if present, overwrites ftnext, which will be t –> t^beta.
  • *args, **kwargs: arguments, named arguments, given to playerAlgo.

Example:

>>> from Policies import *
>>> import random; random.seed(0); import numpy as np; np.random.seed(0)
>>> nbArms = 17
>>> nbPlayers = 6
>>> p0, alpha_p0 = 0.6, 0.5
>>> s = ALOHA(nbPlayers, nbArms, Thompson, p0=p0, alpha_p0=alpha_p0, ftnext=tnext_log)
>>> [ child.choice() for child in s.children ]
[6, 11, 8, 4, 8, 8]
>>> s = ALOHA(nbPlayers, nbArms, UCBalpha, p0=p0, alpha_p0=alpha_p0, beta=0.5, alpha=1)
>>> [ child.choice() for child in s.children ]
[1, 0, 5, 2, 15, 3]
  • To get a list of usable players, use s.children.
  • Warning: s._players is for internal use ONLY!
__module__ = 'PoliciesMultiPlayers.ALOHA'
nbPlayers = None

Number of players

nbArms = None

Number of arms

children = None

List of children, fake algorithms

__str__()[source]

Return str(self).

PoliciesMultiPlayers.ALOHA.random() → x in the interval [0, 1).