PoliciesMultiPlayers.rhoCentralized module¶

rhoCentralized: implementation of the multi-player policy from [Distributed Algorithms for Learning…, Anandkumar et al., 2010](http://ieeexplore.ieee.org/document/5462144/).

Each child player is selfish, and plays according to an index policy (any index policy, e.g., UCB, Thompson, KL-UCB, BayesUCB etc),
But instead of aiming at the best (the 1-st best) arm, player i aims at the rank_i-th best arm,
Every player has rank_i = i + 1, as given by the base station.

Note

This is not fully decentralized: as each child player needs to know the (fixed) number of players, and an initial orthogonal configuration.

Warning

This policy is NOT efficient at ALL! Don’t use it! It seems a smart idea, but it’s not.

class PoliciesMultiPlayers.rhoCentralized.oneRhoCentralized(maxRank, mother, playerId, rank=None, *args, **kwargs)[source]¶

Bases: PoliciesMultiPlayers.ChildPointer.ChildPointer

Class that acts as a child policy, but in fact it pass all its method calls to the mother class, who passes it to its i-th player.

The player does not aim at the best arm, but at the rank-th best arm, based on her index policy.

__init__(maxRank, mother, playerId, rank=None, *args, **kwargs)[source]¶: Initialize self. See help(type(self)) for accurate signature.

maxRank = None¶: Max rank, usually nbPlayers but can be different

keep_the_same_rank = None¶: If True, the rank is kept constant during the game, as if it was given by the Base Station

rank = None¶: Current rank, starting to 1 by default, or ‘rank’ if given as an argument

__str__()[source]¶: Return str(self).

startGame()[source]¶: Start game.

handleCollision(arm, reward=None)[source]¶: Get a new fully random rank, and give reward to the algorithm if not None.

choice()[source]¶: Chose with the actual rank.

__module__ = 'PoliciesMultiPlayers.rhoCentralized'¶

class PoliciesMultiPlayers.rhoCentralized.rhoCentralized(nbPlayers, nbArms, playerAlgo, maxRank=None, orthogonalRanks=True, *args, **kwargs)[source]¶

Bases: PoliciesMultiPlayers.BaseMPPolicy.BaseMPPolicy

rhoCentralized: implementation of a variant of the multi-player rhoRand policy from [Distributed Algorithms for Learning…, Anandkumar et al., 2010](http://ieeexplore.ieee.org/document/5462144/).

__init__(nbPlayers, nbArms, playerAlgo, maxRank=None, orthogonalRanks=True, *args, **kwargs)[source]¶

nbPlayers: number of players to create (in self._players).
playerAlgo: class to use for every players.
nbArms: number of arms, given as first argument to playerAlgo.
maxRank: maximum rank allowed by the rhoCentralized child (default to nbPlayers, but for instance if there is 2 × rhoCentralized[UCB] + 2 × rhoCentralized[klUCB], maxRank should be 4 not 2).
orthogonalRanks: if True, orthogonal ranks 1..M are directly affected to the players 1..M.
*args, **kwargs: arguments, named arguments, given to playerAlgo.

Example:

>>> from Policies import *
>>> import random; random.seed(0); import numpy as np; np.random.seed(0)
>>> nbArms = 17
>>> nbPlayers = 6
>>> s = rhoCentralized(nbPlayers, nbArms, UCB)
>>> [ child.choice() for child in s.children ]
[12, 15, 0, 3, 3, 7]
>>> [ child.choice() for child in s.children ]
[9, 4, 6, 12, 1, 6]

To get a list of usable players, use s.children.
Warning: s._players is for internal use ONLY!

maxRank = None¶: Max rank, usually nbPlayers but can be different

nbPlayers = None¶: Number of players

orthogonalRanks = None¶: Using orthogonal ranks from starting

children = None¶: List of children, fake algorithms

nbArms = None¶: Number of arms

__str__()[source]¶: Return str(self).

__module__ = 'PoliciesMultiPlayers.rhoCentralized'¶