Environment.EvaluatorMultiPlayers module

EvaluatorMultiPlayers class to wrap and run the simulations, for the multi-players case. Lots of plotting methods, to have various visualizations. See documentation.

Environment.EvaluatorMultiPlayers.USE_PICKLE = False

Should we save the figure objects to a .pickle file at the end of the simulation?

Environment.EvaluatorMultiPlayers._nbOfArgs(function)[source]
Environment.EvaluatorMultiPlayers.REPETITIONS = 1

Default nb of repetitions

Environment.EvaluatorMultiPlayers.DELTA_T_PLOT = 50

Default sampling rate for plotting

Environment.EvaluatorMultiPlayers.COUNT_RANKS_MARKOV_CHAIN = False

If true, count and then print a lot of statistics for the Markov Chain of the underlying configurations on ranks

Environment.EvaluatorMultiPlayers.MORE_ACCURATE = True

Use the count of selections instead of rewards for a more accurate mean/var reward measure.

Environment.EvaluatorMultiPlayers.plot_lowerbounds = True

Default is to plot the lower-bounds

Environment.EvaluatorMultiPlayers.USE_BOX_PLOT = True

True to use boxplot, False to use violinplot (default).

Environment.EvaluatorMultiPlayers.nb_break_points = 0

Default nb of random events

Environment.EvaluatorMultiPlayers.FINAL_RANKS_ON_AVERAGE = True

Default value for finalRanksOnAverage

Environment.EvaluatorMultiPlayers.USE_JOBLIB_FOR_POLICIES = False

Default value for useJoblibForPolicies. Does not speed up to use it (too much overhead in using too much threads); so it should really be disabled.

class Environment.EvaluatorMultiPlayers.EvaluatorMultiPlayers(configuration, moreAccurate=True)[source]

Bases: object

Evaluator class to run the simulations, for the multi-players case.

__init__(configuration, moreAccurate=True)[source]

Initialize self. See help(type(self)) for accurate signature.

cfg = None

Configuration dictionnary

nbPlayers = None

Number of players

repetitions = None

Number of repetitions

horizon = None

Horizon (number of time steps)

collisionModel = None

Which collision model should be used

full_lost_if_collision = None

Is there a full loss of rewards if collision ? To compute the correct decomposition of regret

moreAccurate = None

Use the count of selections instead of rewards for a more accurate mean/var reward measure.

finalRanksOnAverage = None

Final display of ranks are done on average rewards?

averageOn = None

How many last steps for final rank average rewards

nb_break_points = None

How many random events?

plot_lowerbounds = None

Should we plot the lower-bounds?

useJoblib = None

Use joblib to parallelize for loop on repetitions (useful)

showplot = None

Show the plot (interactive display or not)

use_box_plot = None

To use box plot (or violin plot if False). Force to use boxplot if repetitions=1.

count_ranks_markov_chain = None

If true, count and then print a lot of statistics for the Markov Chain of the underlying configurations on ranks

change_labels = None

Possibly empty dictionary to map ‘playerId’ to new labels (overwrite their name).

append_labels = None

Possibly empty dictionary to map ‘playerId’ to new labels (by appending the result from ‘append_labels’).

envs = None

List of environments

players = None

List of players

rewards = None

For each env, history of rewards

pulls = None

For each env, keep the history of arm pulls (mean)

lastPulls = None

For each env, keep the distribution of arm pulls

allPulls = None

For each env, keep the full history of arm pulls

collisions = None

For each env, keep the history of collisions on all arms

lastCumCollisions = None

For each env, last count of collisions on all arms

nbSwitchs = None

For each env, keep the history of switches (change of configuration of players)

bestArmPulls = None

For each env, keep the history of best arm pulls

freeTransmissions = None

For each env, keep the history of successful transmission (1 - collisions, basically)

lastCumRewards = None

For each env, last accumulated rewards, to compute variance and histogram of whole regret R_T

runningTimes = None

For each env, keep the history of running times

memoryConsumption = None

For each env, keep the history of running times

__initEnvironments__()[source]

Create environments.

__initPlayers__(env)[source]

Create or initialize players.

startAllEnv()[source]

Simulate all envs.

startOneEnv(envId, env)[source]

Simulate that env.

saveondisk(filepath='saveondisk_EvaluatorMultiPlayers.hdf5')[source]

Save the content of the internal data to into a HDF5 file on the disk.

loadfromdisk(filepath)[source]

Update internal memory of the Evaluator object by loading data the opened HDF5 file.

Warning

FIXME this is not YET implemented!

getPulls(playerId, envId=0)[source]

Extract mean pulls.

getAllPulls(playerId, armId, envId=0)[source]

Extract mean of all pulls.

getNbSwitchs(playerId, envId=0)[source]

Extract mean nb of switches.

getCentralizedNbSwitchs(envId=0)[source]

Extract average of mean nb of switches.

getBestArmPulls(playerId, envId=0)[source]

Extract mean of best arms pulls.

getfreeTransmissions(playerId, envId=0)[source]

Extract mean of successful transmission.

getCollisions(armId, envId=0)[source]

Extract mean of number of collisions.

getRewards(playerId, envId=0)[source]

Extract mean of rewards.

getRegretMean(playerId, envId=0)[source]

Extract mean of regret, for one arm for one player (no meaning).

Warning

This is the centralized regret, for one arm, it does not make much sense in the multi-players setting!

getCentralizedRegret_LessAccurate(envId=0)[source]

Compute the empirical centralized regret: cumsum on time of the mean rewards of the M best arms - cumsum on time of the empirical rewards obtained by the players, based on accumulated rewards.

getFirstRegretTerm(envId=0)[source]

Extract and compute the first term \((a)\) in the centralized regret: losses due to pulling suboptimal arms.

getSecondRegretTerm(envId=0)[source]

Extract and compute the second term \((b)\) in the centralized regret: losses due to not pulling optimal arms.

getThirdRegretTerm(envId=0)[source]

Extract and compute the third term \((c)\) in the centralized regret: losses due to collisions.

getCentralizedRegret_MoreAccurate(envId=0)[source]

Compute the empirical centralized regret, based on counts of selections and not actual rewards.

getCentralizedRegret(envId=0, moreAccurate=None)[source]

Using either the more accurate or the less accurate regret count.

getLastRegrets_LessAccurate(envId=0)[source]

Extract last regrets, based on accumulated rewards.

getAllLastWeightedSelections(envId=0)[source]

Extract weighted count of selections.

getLastRegrets_MoreAccurate(envId=0)[source]

Extract last regrets, based on counts of selections and not actual rewards.

getLastRegrets(envId=0, moreAccurate=None)[source]

Using either the more accurate or the less accurate regret count.

getRunningTimes(envId=0)[source]

Get the means and stds and list of running time of the different players.

getMemoryConsumption(envId=0)[source]

Get the means and stds and list of memory consumptions of the different players.

plotRewards(envId=0, savefig=None, semilogx=False, moreAccurate=None)[source]

Plot the decentralized (vectorial) rewards, for each player.

plotFairness(envId=0, savefig=None, semilogx=False, fairness='default', evaluators=())[source]

Plot a certain measure of “fairness”, from these personal rewards, support more than one environments (use evaluators to give a list of other environments).

plotRegretCentralized(envId=0, savefig=None, semilogx=False, semilogy=False, loglog=False, normalized=False, evaluators=(), subTerms=False, sumofthreeterms=False, moreAccurate=None)[source]

Plot the centralized cumulated regret, support more than one environments (use evaluators to give a list of other environments).

  • The lower bounds are also plotted (Besson & Kaufmann, and Anandkumar et al).
  • The three terms of the regret are also plotting if evaluators = () (that’s the default).
plotNbSwitchs(envId=0, savefig=None, semilogx=False, cumulated=False)[source]

Plot cumulated number of switchs (to evaluate the switching costs), comparing each player.

plotNbSwitchsCentralized(envId=0, savefig=None, semilogx=False, cumulated=False, evaluators=())[source]

Plot the centralized cumulated number of switchs (to evaluate the switching costs), support more than one environments (use evaluators to give a list of other environments).

plotBestArmPulls(envId=0, savefig=None)[source]

Plot the frequency of pulls of the best channel.

  • Warning: does not adapt to dynamic settings!
plotAllPulls(envId=0, savefig=None, cumulated=True, normalized=False)[source]

Plot the frequency of use of every channels, one figure for each channel. Not so useful.

plotFreeTransmissions(envId=0, savefig=None, cumulated=False)[source]

Plot the frequency free transmission.

plotNbCollisions(envId=0, savefig=None, semilogx=False, semilogy=False, loglog=False, cumulated=False, upperbound=False, evaluators=())[source]

Plot the frequency or cum number of collisions, support more than one environments (use evaluators to give a list of other environments).

plotFrequencyCollisions(envId=0, savefig=None, piechart=True, semilogy=False)[source]

Plot the frequency of collision, in a pie chart (histogram not supported yet).

printRunningTimes(envId=0, precision=3, evaluators=())[source]

Print the average+-std runnning time of the different players.

printMemoryConsumption(envId=0, evaluators=())[source]

Print the average+-std memory consumption of the different players.

plotRunningTimes(envId=0, savefig=None, base=1, unit='seconds', evaluators=())[source]

Plot the running times of the different players, as a box plot for each evaluators.

plotMemoryConsumption(envId=0, savefig=None, base=1024, unit='KiB', evaluators=())[source]

Plot the memory consumption of the different players, as a box plot for each.

printFinalRanking(envId=0, verb=True)[source]

Compute and print the ranking of the different players.

printFinalRankingAll(envId=0, evaluators=())[source]

Compute and print the ranking of the different players.

printLastRegrets(envId=0, evaluators=(), moreAccurate=None)[source]

Print the last regrets of the different evaluators.

printLastRegretsPM(envId=0, evaluators=(), moreAccurate=None)[source]

Print the average+-std last regret of the different players.

plotLastRegrets(envId=0, normed=False, subplots=True, nbbins=15, log=False, all_on_separate_figures=False, sharex=False, sharey=False, boxplot=False, normalized_boxplot=True, savefig=None, moreAccurate=None, evaluators=())[source]

Plot histogram of the regrets R_T for all evaluators.

plotHistoryOfMeans(envId=0, horizon=None, savefig=None)[source]

Plot the history of means, as a plot with x axis being the time, y axis the mean rewards, and K curves one for each arm.

strPlayers(short=False, latex=True)[source]

Get a string of the players for this environment.

__dict__ = mappingproxy({'__module__': 'Environment.EvaluatorMultiPlayers', '__doc__': ' Evaluator class to run the simulations, for the multi-players case.\n ', '__init__': <function EvaluatorMultiPlayers.__init__>, '__initEnvironments__': <function EvaluatorMultiPlayers.__initEnvironments__>, '__initPlayers__': <function EvaluatorMultiPlayers.__initPlayers__>, 'startAllEnv': <function EvaluatorMultiPlayers.startAllEnv>, 'startOneEnv': <function EvaluatorMultiPlayers.startOneEnv>, 'saveondisk': <function EvaluatorMultiPlayers.saveondisk>, 'loadfromdisk': <function EvaluatorMultiPlayers.loadfromdisk>, 'getPulls': <function EvaluatorMultiPlayers.getPulls>, 'getAllPulls': <function EvaluatorMultiPlayers.getAllPulls>, 'getNbSwitchs': <function EvaluatorMultiPlayers.getNbSwitchs>, 'getCentralizedNbSwitchs': <function EvaluatorMultiPlayers.getCentralizedNbSwitchs>, 'getBestArmPulls': <function EvaluatorMultiPlayers.getBestArmPulls>, 'getfreeTransmissions': <function EvaluatorMultiPlayers.getfreeTransmissions>, 'getCollisions': <function EvaluatorMultiPlayers.getCollisions>, 'getRewards': <function EvaluatorMultiPlayers.getRewards>, 'getRegretMean': <function EvaluatorMultiPlayers.getRegretMean>, 'getCentralizedRegret_LessAccurate': <function EvaluatorMultiPlayers.getCentralizedRegret_LessAccurate>, 'getFirstRegretTerm': <function EvaluatorMultiPlayers.getFirstRegretTerm>, 'getSecondRegretTerm': <function EvaluatorMultiPlayers.getSecondRegretTerm>, 'getThirdRegretTerm': <function EvaluatorMultiPlayers.getThirdRegretTerm>, 'getCentralizedRegret_MoreAccurate': <function EvaluatorMultiPlayers.getCentralizedRegret_MoreAccurate>, 'getCentralizedRegret': <function EvaluatorMultiPlayers.getCentralizedRegret>, 'getLastRegrets_LessAccurate': <function EvaluatorMultiPlayers.getLastRegrets_LessAccurate>, 'getAllLastWeightedSelections': <function EvaluatorMultiPlayers.getAllLastWeightedSelections>, 'getLastRegrets_MoreAccurate': <function EvaluatorMultiPlayers.getLastRegrets_MoreAccurate>, 'getLastRegrets': <function EvaluatorMultiPlayers.getLastRegrets>, 'getRunningTimes': <function EvaluatorMultiPlayers.getRunningTimes>, 'getMemoryConsumption': <function EvaluatorMultiPlayers.getMemoryConsumption>, 'plotRewards': <function EvaluatorMultiPlayers.plotRewards>, 'plotFairness': <function EvaluatorMultiPlayers.plotFairness>, 'plotRegretCentralized': <function EvaluatorMultiPlayers.plotRegretCentralized>, 'plotNbSwitchs': <function EvaluatorMultiPlayers.plotNbSwitchs>, 'plotNbSwitchsCentralized': <function EvaluatorMultiPlayers.plotNbSwitchsCentralized>, 'plotBestArmPulls': <function EvaluatorMultiPlayers.plotBestArmPulls>, 'plotAllPulls': <function EvaluatorMultiPlayers.plotAllPulls>, 'plotFreeTransmissions': <function EvaluatorMultiPlayers.plotFreeTransmissions>, 'plotNbCollisions': <function EvaluatorMultiPlayers.plotNbCollisions>, 'plotFrequencyCollisions': <function EvaluatorMultiPlayers.plotFrequencyCollisions>, 'printRunningTimes': <function EvaluatorMultiPlayers.printRunningTimes>, 'printMemoryConsumption': <function EvaluatorMultiPlayers.printMemoryConsumption>, 'plotRunningTimes': <function EvaluatorMultiPlayers.plotRunningTimes>, 'plotMemoryConsumption': <function EvaluatorMultiPlayers.plotMemoryConsumption>, 'printFinalRanking': <function EvaluatorMultiPlayers.printFinalRanking>, 'printFinalRankingAll': <function EvaluatorMultiPlayers.printFinalRankingAll>, 'printLastRegrets': <function EvaluatorMultiPlayers.printLastRegrets>, 'printLastRegretsPM': <function EvaluatorMultiPlayers.printLastRegretsPM>, 'plotLastRegrets': <function EvaluatorMultiPlayers.plotLastRegrets>, 'plotHistoryOfMeans': <function EvaluatorMultiPlayers.plotHistoryOfMeans>, 'strPlayers': <function EvaluatorMultiPlayers.strPlayers>, '__dict__': <attribute '__dict__' of 'EvaluatorMultiPlayers' objects>, '__weakref__': <attribute '__weakref__' of 'EvaluatorMultiPlayers' objects>})
__module__ = 'Environment.EvaluatorMultiPlayers'
__weakref__

list of weak references to the object (if defined)

Environment.EvaluatorMultiPlayers.delayed_play(env, players, horizon, collisionModel, seed=None, repeatId=0, count_ranks_markov_chain=False, useJoblib=False)[source]

Helper function for the parallelization.

Environment.EvaluatorMultiPlayers._extract(text)[source]

Extract the str of a player, if it is a child, printed as ‘#[0-9]+<…>’ –> …