Environment.EvaluatorSparseMultiPlayers module

EvaluatorSparseMultiPlayers class to wrap and run the simulations, for the multi-players case with sparse activated players. Lots of plotting methods, to have various visualizations. See documentation.


FIXME this environment is not as up-to-date as Environment.EvaluatorMultiPlayers.

Environment.EvaluatorSparseMultiPlayers.REPETITIONS = 1

Default nb of repetitions

Environment.EvaluatorSparseMultiPlayers.ACTIVATION = 1

Default probability of activation

Environment.EvaluatorSparseMultiPlayers.DELTA_T_PLOT = 50

Default sampling rate for plotting

Environment.EvaluatorSparseMultiPlayers.MORE_ACCURATE = True

Use the count of selections instead of rewards for a more accurate mean/std reward measure.

Environment.EvaluatorSparseMultiPlayers.FINAL_RANKS_ON_AVERAGE = True

Default value for finalRanksOnAverage

Environment.EvaluatorSparseMultiPlayers.USE_JOBLIB_FOR_POLICIES = False

Default value for useJoblibForPolicies. Does not speed up to use it (too much overhead in using too much threads); so it should really be disabled.

Environment.EvaluatorSparseMultiPlayers.PICKLE_IT = True

Default value for pickleit for saving the figures. If True, then all plt.figure object are saved (in pickle format).

class Environment.EvaluatorSparseMultiPlayers.EvaluatorSparseMultiPlayers(configuration, moreAccurate=True)[source]

Bases: Environment.EvaluatorMultiPlayers.EvaluatorMultiPlayers

Evaluator class to run the simulations, for the multi-players case.

__init__(configuration, moreAccurate=True)[source]

Initialize self. See help(type(self)) for accurate signature.

activations = None

Probability of activations

collisionModel = None

Which collision model should be used

full_lost_if_collision = None

Is there a full loss of rewards if collision ? To compute the correct decomposition of regret

startOneEnv(envId, env)[source]

Simulate that env.


Compute the empirical centralized regret: cumsum on time of the mean rewards of the M best arms - cumsum on time of the empirical rewards obtained by the players, based on accumulated rewards.


Extract and compute the first term \((a)\) in the centralized regret: losses due to pulling suboptimal arms.


Extract and compute the second term \((b)\) in the centralized regret: losses due to not pulling optimal arms.


Extract and compute the third term \((c)\) in the centralized regret: losses due to collisions.


Compute the empirical centralized regret, based on counts of selections and not actual rewards.

getCentralizedRegret(envId=0, moreAccurate=None)[source]

Using either the more accurate or the less accurate regret count.


Extract last regrets, based on accumulated rewards.


Extract weighted count of selections.


Extract last regrets, based on counts of selections and not actual rewards.

getLastRegrets(envId=0, moreAccurate=None)[source]

Using either the more accurate or the less accurate regret count.

strPlayers(short=False, latex=True)[source]

Get a string of the players and their activations probability for this environment.

__module__ = 'Environment.EvaluatorSparseMultiPlayers'
Environment.EvaluatorSparseMultiPlayers.delayed_play(env, players, horizon, collisionModel, activations, seed=None, repeatId=0)[source]

Helper function for the parallelization.


random() -> x in the interval [0, 1).


True with probability = proba, False with probability = 1 - proba.


>>> import random; random.seed(0)
>>> tosses = [with_proba(0.6) for _ in range(10000)]; sum(tosses)
>>> tosses = [with_proba(0.111) for _ in range(100000)]; sum(tosses)