PoliciesMultiPlayers.rhoRandSticky module

rhoRandSticky: implementation of a variant of the multi-player policy rhoRand from [Distributed Algorithms for Learning…, Anandkumar et al., 2010](http://ieeexplore.ieee.org/document/5462144/).

  • Each child player is selfish, and plays according to an index policy (any index policy, e.g., UCB, Thompson, KL-UCB, BayesUCB etc),
  • But instead of aiming at the best (the 1-st best) arm, player i aims at the rank_i-th best arm,
  • At first, every player has a random rank_i from 1 to M, and when a collision occurs, rank_i is sampled from a uniform distribution on [1, .., M] where M is the number of player.
  • The only difference with rhoRand is that once a player selected a rank and did not encounter a collision for STICKY_TIME time steps, he will never change his rank. rhoRand has STICKY_TIME = +oo, MusicalChair is something like STICKY_TIME = 1, this variant rhoRandSticky has this as a parameter.


This is not fully decentralized: as each child player needs to know the (fixed) number of players.

PoliciesMultiPlayers.rhoRandSticky.STICKY_TIME = 10

Default value for STICKY_TIME

class PoliciesMultiPlayers.rhoRandSticky.oneRhoRandSticky(maxRank, stickyTime, *args, **kwargs)[source]

Bases: PoliciesMultiPlayers.rhoRand.oneRhoRand

Class that acts as a child policy, but in fact it pass all its method calls to the mother class, who passes it to its i-th player.

  • Except for the handleCollision method: a new random rank is sampled after observing a collision,
  • And the player does not aim at the best arm, but at the rank-th best arm, based on her index policy.
__init__(maxRank, stickyTime, *args, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

maxRank = None

Max rank, usually nbPlayers but can be different

stickyTime = None

Number of time steps needed without collisions before sitting (never changing rank again)

rank = None

Current rank, starting to 1 by default

sitted = None

Not yet sitted. After stickyTime steps without collisions, sit and never change rank again.

stepsWithoutCollisions = None

Number of steps since we chose that rank and did not see any collision. As soon as this gets greater than stickyTime, the player sit.


Return str(self).


Start game.

handleCollision(arm, reward=None)[source]

Get a new fully random rank, and give reward to the algorithm if not None.

getReward(arm, reward)[source]

Pass the call to self.mother._getReward_one(playerId, arm, reward) with the player’s ID number.

  • Additionally, if the current rank was good enough to not bring any collision during the last stickyTime time steps, the player “sits” on that rank.
__module__ = 'PoliciesMultiPlayers.rhoRandSticky'
class PoliciesMultiPlayers.rhoRandSticky.rhoRandSticky(nbPlayers, nbArms, playerAlgo, stickyTime=10, maxRank=None, lower=0.0, amplitude=1.0, *args, **kwargs)[source]

Bases: PoliciesMultiPlayers.rhoRand.rhoRand

rhoRandSticky: implementation of a variant of the multi-player policy rhoRand from [Distributed Algorithms for Learning…, Anandkumar et al., 2010](http://ieeexplore.ieee.org/document/5462144/).

__init__(nbPlayers, nbArms, playerAlgo, stickyTime=10, maxRank=None, lower=0.0, amplitude=1.0, *args, **kwargs)[source]
  • nbPlayers: number of players to create (in self._players).
  • playerAlgo: class to use for every players.
  • nbArms: number of arms, given as first argument to playerAlgo.
  • stickyTime: given to the oneRhoRandSticky objects (see above).
  • maxRank: maximum rank allowed by the rhoRandSticky child (default to nbPlayers, but for instance if there is 2 × rhoRandSticky[UCB] + 2 × rhoRandSticky[klUCB], maxRank should be 4 not 2).
  • *args, **kwargs: arguments, named arguments, given to playerAlgo.


>>> from Policies import *
>>> import random; random.seed(0); import numpy as np; np.random.seed(0)
>>> nbArms = 17
>>> nbPlayers = 6
>>> stickyTime = 5
>>> s = rhoRandSticky(nbPlayers, nbArms, UCB, stickyTime=stickyTime)
>>> [ child.choice() for child in s.children ]
[12, 15, 0, 3, 3, 7]
>>> [ child.choice() for child in s.children ]
[9, 4, 6, 12, 1, 6]
  • To get a list of usable players, use s.children.


s._players is for internal use ONLY!

maxRank = None

Max rank, usually nbPlayers but can be different

stickyTime = None

Number of time steps needed without collisions before sitting (never changing rank again)

nbPlayers = None

Number of players

children = None

List of children, fake algorithms

nbArms = None

Number of arms


Return str(self).

__module__ = 'PoliciesMultiPlayers.rhoRandSticky'