PoliciesMultiPlayers.rhoLearnEst module¶
rhoLearnEst: implementation of the multi-player policy from [Distributed Algorithms for Learning…, Anandkumar et al., 2010](http://ieeexplore.ieee.org/document/5462144/), using a learning algorithm instead of a random exploration for choosing the rank, and without knowing the number of users.
- It generalizes
PoliciesMultiPlayers.rhoLearn.rhoLearn
simply by letting the ranks be \(\{1,\dots,K\}\) and not in \(\{1,\dots,M\}\), by hoping the learning algorithm will be “smart enough” and learn by itself that ranks should be \(\leq M\). - Each child player is selfish, and plays according to an index policy (any index policy, e.g., UCB, Thompson, KL-UCB, BayesUCB etc),
- But instead of aiming at the best (the 1-st best) arm, player i aims at the rank_i-th best arm,
- At first, every player has a random rank_i from 1 to M, and when a collision occurs, rank_i is given by a second learning algorithm, playing on arms = ranks from [1, .., M], where M is the number of player.
- If rankSelection = Uniform, this is like rhoRand, but if it is a smarter policy, it might be better! Warning: no theoretical guarantees exist!
- Reference: [Proof-of-Concept System for Opportunistic Spectrum Access in Multi-user Decentralized Networks, S.J.Darak, C.Moy, J.Palicot, EAI 2016](https://doi.org/10.4108/eai.5-9-2016.151647), algorithm 2. (for BayesUCB only)
Note
This is fully decentralized: each child player does not need to know the (fixed) number of players, it will learn to select ranks only in \(\{1,\dots,M\}\) instead of \(\{1,\dots,K\}\).
Warning
This policy does not work very well!
-
class
PoliciesMultiPlayers.rhoLearnEst.
oneRhoLearnEst
(maxRank, rankSelectionAlgo, change_rank_each_step, *args, **kwargs)[source]¶ Bases:
PoliciesMultiPlayers.rhoLearn.oneRhoLearn
-
__module__
= 'PoliciesMultiPlayers.rhoLearnEst'¶
-
-
class
PoliciesMultiPlayers.rhoLearnEst.
rhoLearnEst
(nbPlayers, nbArms, playerAlgo, rankSelectionAlgo=<class 'Policies.Uniform.Uniform'>, lower=0.0, amplitude=1.0, change_rank_each_step=False, *args, **kwargs)[source]¶ Bases:
PoliciesMultiPlayers.rhoLearn.rhoLearn
rhoLearnEst: implementation of the multi-player policy from [Distributed Algorithms for Learning…, Anandkumar et al., 2010](http://ieeexplore.ieee.org/document/5462144/), using a learning algorithm instead of a random exploration for choosing the rank, and without knowing the number of users.
-
__init__
(nbPlayers, nbArms, playerAlgo, rankSelectionAlgo=<class 'Policies.Uniform.Uniform'>, lower=0.0, amplitude=1.0, change_rank_each_step=False, *args, **kwargs)[source]¶ - nbPlayers: number of players to create (in self._players).
- playerAlgo: class to use for every players.
- nbArms: number of arms, given as first argument to playerAlgo.
- rankSelectionAlgo: algorithm to use for selecting the ranks.
- *args, **kwargs: arguments, named arguments, given to playerAlgo.
Difference with
PoliciesMultiPlayers.rhoLearn.rhoLearn
:- maxRank: maximum rank allowed by the rhoRand child, is not an argument, but it is always nbArms (= K).
Example:
>>> from Policies import * >>> import random; random.seed(0); import numpy as np; np.random.seed(0) >>> nbArms = 17 >>> nbPlayers = 6 >>> s = rhoLearnEst(nbPlayers, nbArms, UCB, UCB) >>> [ child.choice() for child in s.children ] [12, 15, 0, 3, 3, 7] >>> [ child.choice() for child in s.children ] [9, 4, 6, 12, 1, 6]
- To get a list of usable players, use
s.children
. - Warning:
s._players
is for internal use ONLY!
-
nbPlayers
= None¶ Number of players
-
children
= None¶ List of children, fake algorithms
-
rankSelectionAlgo
= None¶ Policy to use to chose the ranks
-
nbArms
= None¶ Number of arms
-
change_rank_each_step
= None¶ Change rank at every steps?
-
__module__
= 'PoliciesMultiPlayers.rhoLearnEst'¶
-