PoliciesMultiPlayers.ALOHA module¶

ALOHA: generalized implementation of the single-player policy from [Concurrent bandits and cognitive radio network, O.Avner & S.Mannor, 2014](https://arxiv.org/abs/1404.5421), for a generic single-player policy.

This policy uses the collision avoidance mechanism that is inspired by the classical ALOHA protocol, and any single-player policy.

PoliciesMultiPlayers.ALOHA.tnext_beta(t, beta=0.5)[source]¶

Simple function, as used in MEGA: upper_tnext(t) = \(t^{\beta}\). Default to \(t^{0.5}\).

>>> tnext_beta(100, beta=0.1)  # doctest: +ELLIPSIS
1.584...
>>> tnext_beta(100, beta=0.5)
10.0
>>> tnext_beta(100, beta=0.9)  # doctest: +ELLIPSIS
63.095...
>>> tnext_beta(1000)  # doctest: +ELLIPSIS
31.622...

PoliciesMultiPlayers.ALOHA.make_tnext_beta(beta=0.5)[source]¶

Returns the function \(t \mapsto t^{\beta}\).

>>> tnext = make_tnext_beta(0.5)
>>> tnext(100)
10.0
>>> tnext(1000)  # doctest: +ELLIPSIS
31.622...

PoliciesMultiPlayers.ALOHA.tnext_log(t, scaling=1.0)[source]¶

Other function, not the one used in MEGA, but our proposal: upper_tnext(t) = \(\text{scaling} * \log(1 + t)\).

>>> tnext_log(100, scaling=1)  # doctest: +ELLIPSIS
4.615...
>>> tnext_log(100, scaling=10)  # doctest: +ELLIPSIS
46.151...
>>> tnext_log(100, scaling=100)  # doctest: +ELLIPSIS
461.512...
>>> tnext_log(1000)  # doctest: +ELLIPSIS
6.908...

PoliciesMultiPlayers.ALOHA.make_tnext_log_scaling(scaling=1.0)[source]¶

Returns the function \(t \mapsto \text{scaling} * \log(1 + t)\).

>>> tnext = make_tnext_log_scaling(1)
>>> tnext(100)  # doctest: +ELLIPSIS
4.615...
>>> tnext(1000)  # doctest: +ELLIPSIS
6.908...

class PoliciesMultiPlayers.ALOHA.oneALOHA(nbPlayers, mother, playerId, nbArms, p0=0.5, alpha_p0=0.5, ftnext=<function tnext_beta>, beta=None)[source]¶

Bases: PoliciesMultiPlayers.ChildPointer.ChildPointer

Class that acts as a child policy, but in fact it pass all its method calls to the mother class, who passes it to its i-th player.

Except for the handleCollision method: the ALOHA collision avoidance protocol is implemented here.

__init__(nbPlayers, mother, playerId, nbArms, p0=0.5, alpha_p0=0.5, ftnext=<function tnext_beta>, beta=None)[source]¶: Initialize self. See help(type(self)) for accurate signature.

nbPlayers = None¶: Number of players

p0 = None¶: Initial probability, should not be modified

p = None¶: Current probability, can be modified

alpha_p0 = None¶: Parameter alpha for the recurrence equation for probability p(t)

beta = None¶: Parameter beta

tnext = None¶: Only store the delta time

t = None¶: Internal time

chosenArm = None¶: Last chosen arm

__str__()[source]¶: Return str(self).

startGame()[source]¶: Start game.

ftnext(t)[source]¶: Time until the arm is removed from list of unavailable arms.

getReward(arm, reward)[source]¶

Receive a reward on arm of index ‘arm’, as described by the ALOHA protocol.

If not collision, receive a reward after pulling the arm.

handleCollision(arm, reward=None)[source]¶: Handle a collision, on arm of index ‘arm’.

Warning

This method has to be implemented in the collision model, it is NOT implemented in the EvaluatorMultiPlayers.

Note

We do not care on which arm the collision occured.

choice()[source]¶: Identify the available arms, and use the underlying single-player policy (UCB, Thompson etc) to choose an arm from this sub-set of arms.

__module__ = 'PoliciesMultiPlayers.ALOHA'¶

class PoliciesMultiPlayers.ALOHA.ALOHA(nbPlayers, nbArms, playerAlgo, p0=0.5, alpha_p0=0.5, ftnext=<function tnext_beta>, beta=None, *args, **kwargs)[source]¶

Bases: PoliciesMultiPlayers.BaseMPPolicy.BaseMPPolicy

ALOHA: implementation of the multi-player policy from [Concurrent bandits and cognitive radio network, O.Avner & S.Mannor, 2014](https://arxiv.org/abs/1404.5421), for a generic single-player policy.

__init__(nbPlayers, nbArms, playerAlgo, p0=0.5, alpha_p0=0.5, ftnext=<function tnext_beta>, beta=None, *args, **kwargs)[source]¶

nbPlayers: number of players to create (in self._players).
playerAlgo: class to use for every players.
nbArms: number of arms, given as first argument to playerAlgo.
p0: initial probability p(0); p(t) is the probability of persistance on the chosenArm at time t
alpha_p0: scaling in the update for p[t+1] <- alpha_p0 p[t] + (1 - alpha_p0)
ftnext: general function, default to t -> t^beta, to know from where to sample a random time t_next(k), until when the chosenArm is unavailable. t -> log(1 + t) is also possible.
(optional) beta: if present, overwrites ftnext, which will be t –> t^beta.
*args, **kwargs: arguments, named arguments, given to playerAlgo.

Example:

>>> from Policies import *
>>> import random; random.seed(0); import numpy as np; np.random.seed(0)
>>> nbArms = 17
>>> nbPlayers = 6
>>> p0, alpha_p0 = 0.6, 0.5
>>> s = ALOHA(nbPlayers, nbArms, Thompson, p0=p0, alpha_p0=alpha_p0, ftnext=tnext_log)
>>> [ child.choice() for child in s.children ]
[6, 11, 8, 4, 8, 8]
>>> s = ALOHA(nbPlayers, nbArms, UCBalpha, p0=p0, alpha_p0=alpha_p0, beta=0.5, alpha=1)
>>> [ child.choice() for child in s.children ]
[1, 0, 5, 2, 15, 3]

To get a list of usable players, use s.children.
Warning: s._players is for internal use ONLY!

__module__ = 'PoliciesMultiPlayers.ALOHA'¶

nbPlayers = None¶: Number of players

nbArms = None¶: Number of arms

children = None¶: List of children, fake algorithms

__str__()[source]¶: Return str(self).

PoliciesMultiPlayers.ALOHA.random() → x in the interval [0, 1).¶