Environment.EvaluatorMultiPlayers module

EvaluatorMultiPlayers class to wrap and run the simulations, for the multi-players case. Lots of plotting methods, to have various visualizations. See documentation.

Environment.EvaluatorMultiPlayers.USE_PICKLE = False

Should we save the figure objects to a .pickle file at the end of the simulation?

Environment.EvaluatorMultiPlayers.REPETITIONS = 1

Default nb of repetitions

Environment.EvaluatorMultiPlayers.DELTA_T_PLOT = 50

Default sampling rate for plotting

Environment.EvaluatorMultiPlayers.COUNT_RANKS_MARKOV_CHAIN = False

If true, count and then print a lot of statistics for the Markov Chain of the underlying configurations on ranks

Environment.EvaluatorMultiPlayers.MORE_ACCURATE = True

Use the count of selections instead of rewards for a more accurate mean/var reward measure.

Environment.EvaluatorMultiPlayers.plot_lowerbounds = True

Default is to plot the lower-bounds

Environment.EvaluatorMultiPlayers.USE_BOX_PLOT = True

True to use boxplot, False to use violinplot (default).

Environment.EvaluatorMultiPlayers.nb_break_points = 0

Default nb of random events

Environment.EvaluatorMultiPlayers.FINAL_RANKS_ON_AVERAGE = True

Default value for finalRanksOnAverage

Environment.EvaluatorMultiPlayers.USE_JOBLIB_FOR_POLICIES = False

Default value for useJoblibForPolicies. Does not speed up to use it (too much overhead in using too much threads); so it should really be disabled.

class Environment.EvaluatorMultiPlayers.EvaluatorMultiPlayers(configuration, moreAccurate=True)[source]

Evaluator class to run the simulations, for the multi-players case.

__init__(configuration, moreAccurate=True)[source]

cfg = None

Configuration dictionnary

nbPlayers = None

Number of players

repetitions = None

Number of repetitions

horizon = None

Horizon (number of time steps)

collisionModel = None

Which collision model should be used

full_lost_if_collision = None

Is there a full loss of rewards if collision ? To compute the correct decomposition of regret

moreAccurate = None

Use the count of selections instead of rewards for a more accurate mean/var reward measure.

finalRanksOnAverage = None

Final display of ranks are done on average rewards?

averageOn = None

How many last steps for final rank average rewards

nb_break_points = None

How many random events?

plot_lowerbounds = None

Should we plot the lower-bounds?

useJoblib = None

Use joblib to parallelize for loop on repetitions (useful)

showplot = None

Show the plot (interactive display or not)

use_box_plot = None

To use box plot (or violin plot if False). Force to use boxplot if repetitions=1.

count_ranks_markov_chain = None

If true, count and then print a lot of statistics for the Markov Chain of the underlying configurations on ranks

change_labels = None

Possibly empty dictionary to map ‘playerId’ to new labels (overwrite their name).

append_labels = None

Possibly empty dictionary to map ‘playerId’ to new labels (by appending the result from ‘append_labels’).

envs = None

List of environments

players = None

List of players

rewards = None

For each env, history of rewards

pulls = None

For each env, keep the history of arm pulls (mean)

lastPulls = None

For each env, keep the distribution of arm pulls

allPulls = None

For each env, keep the full history of arm pulls

collisions = None

For each env, keep the history of collisions on all arms

lastCumCollisions = None

For each env, last count of collisions on all arms

nbSwitchs = None

For each env, keep the history of switches (change of configuration of players)

bestArmPulls = None

For each env, keep the history of best arm pulls

freeTransmissions = None

For each env, keep the history of successful transmission (1 - collisions, basically)

lastCumRewards = None

For each env, last accumulated rewards, to compute variance and histogram of whole regret R_T

runningTimes = None

For each env, keep the history of running times

memoryConsumption = None

For each env, keep the history of running times


Create environments.


Create or initialize players.


Simulate all envs.

startOneEnv(envId, env)[source]

Simulate that env.


Save the content of the internal data to into a HDF5 file on the disk.


Update internal memory of the Evaluator object by loading data the opened HDF5 file.


FIXME this is not YET implemented!

getPulls(playerId, envId=0)[source]

Extract mean pulls.

getAllPulls(playerId, armId, envId=0)[source]

Extract mean of all pulls.

getNbSwitchs(playerId, envId=0)[source]

Extract mean nb of switches.


Extract average of mean nb of switches.

getBestArmPulls(playerId, envId=0)[source]

Extract mean of best arms pulls.

getfreeTransmissions(playerId, envId=0)[source]

Extract mean of successful transmission.

getCollisions(armId, envId=0)[source]

Extract mean of number of collisions.

getRewards(playerId, envId=0)[source]

Extract mean of rewards.

getRegretMean(playerId, envId=0)[source]

Extract mean of regret, for one arm for one player (no meaning).


This is the centralized regret, for one arm, it does not make much sense in the multi-players setting!


Compute the empirical centralized regret: cumsum on time of the mean rewards of the M best arms - cumsum on time of the empirical rewards obtained by the players, based on accumulated rewards.


Extract and compute the first term \((a)\) in the centralized regret: losses due to pulling suboptimal arms.


Extract and compute the second term \((b)\) in the centralized regret: losses due to not pulling optimal arms.


Extract and compute the third term \((c)\) in the centralized regret: losses due to collisions.


Compute the empirical centralized regret, based on counts of selections and not actual rewards.

getCentralizedRegret(envId=0, moreAccurate=None)[source]

Using either the more accurate or the less accurate regret count.


Extract last regrets, based on accumulated rewards.


Extract weighted count of selections.


Extract last regrets, based on counts of selections and not actual rewards.

getLastRegrets(envId=0, moreAccurate=None)[source]

Using either the more accurate or the less accurate regret count.


Get the means and stds and list of running time of the different players.


Get the means and stds and list of memory consumptions of the different players.

plotRewards(envId=0, savefig=None, semilogx=False, moreAccurate=None)[source]

Plot the decentralized (vectorial) rewards, for each player.

plotFairness(envId=0, savefig=None, semilogx=False, fairness='default', evaluators=())[source]

Plot a certain measure of “fairness”, from these personal rewards, support more than one environments (use evaluators to give a list of other environments).

plotRegretCentralized(envId=0, savefig=None, semilogx=False, semilogy=False, loglog=False, normalized=False, evaluators=(), subTerms=False, sumofthreeterms=False, moreAccurate=None)[source]

Plot the centralized cumulated regret, support more than one environments (use evaluators to give a list of other environments).

  • The lower bounds are also plotted (Besson & Kaufmann, and Anandkumar et al).
  • The three terms of the regret are also plotting if evaluators = () (that’s the default).
plotNbSwitchs(envId=0, savefig=None, semilogx=False, cumulated=False)[source]

Plot cumulated number of switchs (to evaluate the switching costs), comparing each player.

plotNbSwitchsCentralized(envId=0, savefig=None, semilogx=False, cumulated=False, evaluators=())[source]

Plot the centralized cumulated number of switchs (to evaluate the switching costs), support more than one environments (use evaluators to give a list of other environments).

plotBestArmPulls(envId=0, savefig=None)[source]

Plot the frequency of pulls of the best channel.

  • Warning: does not adapt to dynamic settings!
plotAllPulls(envId=0, savefig=None, cumulated=True, normalized=False)[source]

Plot the frequency of use of every channels, one figure for each channel. Not so useful.

plotFreeTransmissions(envId=0, savefig=None, cumulated=False)[source]

Plot the frequency free transmission.

plotNbCollisions(envId=0, savefig=None, semilogx=False, semilogy=False, loglog=False, cumulated=False, upperbound=False, evaluators=())[source]

Plot the frequency or cum number of collisions, support more than one environments (use evaluators to give a list of other environments).

plotFrequencyCollisions(envId=0, savefig=None, piechart=True, semilogy=False)[source]

Plot the frequency of collision, in a pie chart (histogram not supported yet).

printRunningTimes(envId=0, precision=3, evaluators=())[source]

Print the average+-std runnning time of the different players.

printMemoryConsumption(envId=0, evaluators=())[source]

Print the average+-std memory consumption of the different players.

plotRunningTimes(envId=0, savefig=None, base=1, unit='seconds', evaluators=())[source]

Plot the running times of the different players, as a box plot for each evaluators.

plotMemoryConsumption(envId=0, savefig=None, base=1024, unit='KiB', evaluators=())[source]

Plot the memory consumption of the different players, as a box plot for each.

printFinalRanking(envId=0, verb=True)[source]

Compute and print the ranking of the different players.

printFinalRankingAll(envId=0, evaluators=())[source]

Compute and print the ranking of the different players.

printLastRegrets(envId=0, evaluators=(), moreAccurate=None)[source]

Print the last regrets of the different evaluators.

printLastRegretsPM(envId=0, evaluators=(), moreAccurate=None)[source]

Print the average+-std last regret of the different players.

plotLastRegrets(envId=0, normed=False, subplots=True, nbbins=15, log=False, all_on_separate_figures=False, sharex=False, sharey=False, boxplot=False, normalized_boxplot=True, savefig=None, moreAccurate=None, evaluators=())[source]

Plot histogram of the regrets R_T for all evaluators.

plotHistoryOfMeans(envId=0, horizon=None, savefig=None)[source]

Plot the history of means, as a plot with x axis being the time, y axis the mean rewards, and K curves one for each arm.

strPlayers(short=False, latex=True)[source]

Get a string of the players for this environment.

Environment.EvaluatorMultiPlayers.delayed_play(env, players, horizon, collisionModel, seed=None, repeatId=0, count_ranks_markov_chain=False, useJoblib=False)[source]

Helper function for the parallelization.


Extract the str of a player, if it is a child, printed as ‘#[0-9]+<…>’ –> …