PoliciesMultiPlayers : contains various collision-avoidance protocol for the multi-players setting.
Selfish: a multi-player policy where every player is selfish, they do not try to handle the collisions.
CentralizedNotFair: a multi-player policy which uses a centralize intelligence to affect users to a FIXED arm.
CentralizedFair: a multi-player policy which uses a centralize intelligence to affect users an offset, each one take an orthogonal arm based on (offset + t) % nbArms.
CentralizedIMP: multi-player policies that use centralized but non-omniscient learning to select K = nbPlayers arms at each time step.
OracleNotFair: a multi-player policy with full knowledge and centralized intelligence to affect users to a FIXED arm, among the best arms.
OracleFair: a multi-player policy which uses a centralized intelligence to affect users an offset, each one take an orthogonal arm based on (offset + t) % nbBestArms, among the best arms.
ALOHA: implementation of generic collision avoidance algorithms, relying on a single-player bandit policy (eg.
Thompsonetc). And variants,
rhoCentralizedis a semi-centralized version where orthogonal ranks 1..M are given to the players, instead of just giving them the value of M, but a decentralized learning policy is still used to learn the best arms.
RandTopMis another approach, similar to
MusicalChair, but we hope it will be better, and we succeed in analyzing more easily.
All policies have the same interface, as described in
BaseMPPolicy for decentralized policies,
BaseCentralizedPolicy for centralized policies,
in order to use them in any experiment with the following approach:
my_policy_MP = Policy_MP(nbPlayers, nbArms) children = my_policy_MP.children # get a list of usable single-player policies for one_policy in children: one_policy.startGame() # start the game for t in range(T): for i in range(nbPlayers): k_t[i] = children[i].choice() # chose one arm, for each player for k in range(nbArms): players_who_played_k = [ k_t[i] for i in range(nbPlayers) if k_t[i] == k ] reward = reward_t[k] = sampled from the arm k # sample a reward if len(players_who_played_k) > 1: reward = 0 for i in players_who_played_k: children[i].getReward(k, reward)
- PoliciesMultiPlayers.ALOHA module
- PoliciesMultiPlayers.BaseCentralizedPolicy module
- PoliciesMultiPlayers.BaseMPPolicy module
- PoliciesMultiPlayers.CentralizedCycling module
- PoliciesMultiPlayers.CentralizedFixed module
- PoliciesMultiPlayers.CentralizedIMP module
- PoliciesMultiPlayers.CentralizedMultiplePlay module
- PoliciesMultiPlayers.ChildPointer module
- PoliciesMultiPlayers.DepRound module
- PoliciesMultiPlayers.EstimateM module
- PoliciesMultiPlayers.OracleFair module
- PoliciesMultiPlayers.OracleNotFair module
- PoliciesMultiPlayers.RandTopM module
- PoliciesMultiPlayers.RandTopMEst module
- PoliciesMultiPlayers.Scenario1 module
- PoliciesMultiPlayers.Selfish module
- PoliciesMultiPlayers.rhoCentralized module
- PoliciesMultiPlayers.rhoEst module
- PoliciesMultiPlayers.rhoLearn module
- PoliciesMultiPlayers.rhoLearnEst module
- PoliciesMultiPlayers.rhoLearnExp3 module
- PoliciesMultiPlayers.rhoRand module
- PoliciesMultiPlayers.rhoRandALOHA module
- PoliciesMultiPlayers.rhoRandRand module
- PoliciesMultiPlayers.rhoRandRotating module
- PoliciesMultiPlayers.rhoRandSticky module
- PoliciesMultiPlayers.with_proba module