Environment.CollisionModels module

Define the different collision models.

Collision models are generic functions, taking:

  • the time: t
  • the arms of the current environment: arms
  • the list of players: players
  • the numpy array of their choices: choices
  • the numpy array to store their rewards: rewards
  • the numpy array to store their pulls: pulls
  • the numpy array to store their collisions: collisions

As far as now, there is 4 different collision models implemented:

  • noCollision(): simple collision model where all players sample it and receive the reward.
  • onlyUniqUserGetsReward(): simple collision model, where only the players alone on one arm sample it and receive the reward (default).
  • rewardIsSharedUniformly(): in case of more than one player on one arm, only one player (uniform choice) can sample it and receive the reward.
  • closerUserGetsReward(): in case of more than one player on one arm, only the closer player can sample it and receive the reward. It can take, or create if not given, a random distance of each player to the base station (random number in [0, 1]).
Environment.CollisionModels.onlyUniqUserGetsReward(t, arms, players, choices, rewards, pulls, collisions)[source]

Simple collision model where only the players alone on one arm samples it and receives the reward.

  • This is the default collision model, cf. [[Multi-Player Bandits Revisited, Lilian Besson and Emilie Kaufmann, 2017]](https://hal.inria.fr/hal-01629733).
  • The numpy array ‘choices’ is increased according to the number of users who collided (it is NOT binary).
Environment.CollisionModels.defaultCollisionModel(t, arms, players, choices, rewards, pulls, collisions)

Simple collision model where only the players alone on one arm samples it and receives the reward.

  • This is the default collision model, cf. [[Multi-Player Bandits Revisited, Lilian Besson and Emilie Kaufmann, 2017]](https://hal.inria.fr/hal-01629733).
  • The numpy array ‘choices’ is increased according to the number of users who collided (it is NOT binary).
Environment.CollisionModels.onlyUniqUserGetsRewardSparse(t, arms, players, choices, rewards, pulls, collisions)[source]

Simple collision model where only the players alone on one arm samples it and receives the reward.

  • This is the default collision model, cf. [[Multi-Player Bandits Revisited, Lilian Besson and Emilie Kaufmann, 2017]](https://hal.inria.fr/hal-01629733).
  • The numpy array ‘choices’ is increased according to the number of users who collided (it is NOT binary).
  • Support for player non activated, by choosing a negative index.
Environment.CollisionModels.allGetRewardsAndUseCollision(t, arms, players, choices, rewards, pulls, collisions)[source]

A variant of the first simple collision model where all players sample their arm, receive their rewards, and are informed of the collisions.

Note

it is NOT the one we consider, and so our lower-bound on centralized regret is wrong (users don’t care about collisions for their internal rewards so regret does not take collisions into account!)

  • This is the NOT default collision model, cf. [Liu & Zhao, 2009](https://arxiv.org/abs/0910.2065v3) collision model 1.
  • The numpy array ‘choices’ is increased according to the number of users who collided (it is NOT binary).
Environment.CollisionModels.noCollision(t, arms, players, choices, rewards, pulls, collisions)[source]

Simple collision model where all players sample it and receive the reward.

  • It corresponds to the single-player simulation: each player is a policy, compared without collision.
  • The numpy array ‘collisions’ is not modified.
Environment.CollisionModels.rewardIsSharedUniformly(t, arms, players, choices, rewards, pulls, collisions)[source]

Less simple collision model where:

  • The players alone on one arm sample it and receive the reward.
  • In case of more than one player on one arm, only one player (uniform choice) can sample it and receive the reward. It is chosen by the base station.

Note

it can also model a choice from the users point of view: in a time frame (eg. 1 second), when there is a collision, each colliding user chose (uniformly) a random small time offset (eg. 20 ms), and start sensing + emitting again after that time. The first one to sense is alone, it transmits, and the next ones find the channel used when sensing. So only one player is transmitting, and from the base station point of view, it is the same as if it was chosen uniformly among the colliding users.

Environment.CollisionModels.closerUserGetsReward(t, arms, players, choices, rewards, pulls, collisions, distances='uniform')[source]

Simple collision model where:

  • The players alone on one arm sample it and receive the reward.
  • In case of more than one player on one arm, only the closer player can sample it and receive the reward. It can take, or create if not given, a distance of each player to the base station (numbers in [0, 1]).
  • If distances is not given, it is either generated randomly (random numbers in [0, 1]) or is a linspace of nbPlayers values in (0, 1), equally spacen (default).

Note

This kind of effects is known in telecommunication as the Near-Far effect or the Capture effect [Roberts, 1975](https://dl.acm.org/citation.cfm?id=1024920)

Environment.CollisionModels.collision_models = [<function onlyUniqUserGetsReward>, <function onlyUniqUserGetsRewardSparse>, <function allGetRewardsAndUseCollision>, <function noCollision>, <function rewardIsSharedUniformly>, <function closerUserGetsReward>]

List of possible collision models

Environment.CollisionModels.full_lost_if_collision = {'allGetRewardsAndUseCollision': True, 'closerUserGetsReward': False, 'noCollision': False, 'onlyUniqUserGetsReward': True, 'onlyUniqUserGetsRewardSparse': True, 'rewardIsSharedUniformly': False}

Mapping of collision model names to True or False, to know if a collision implies a lost communication or not in this model