Environment.Evaluator module¶

Evaluator class to wrap and run the simulations. Lots of plotting methods, to have various visualizations.

Environment.Evaluator.USE_PICKLE = False¶: Should we save the figure objects to a .pickle file at the end of the simulation?

Environment.Evaluator._nbOfArgs(function)[source]¶

Environment.Evaluator.REPETITIONS = 1¶: Default nb of repetitions

Environment.Evaluator.DELTA_T_PLOT = 50¶: Default sampling rate for plotting

Environment.Evaluator.plot_lowerbound = True¶: Default is to plot the lower-bound

Environment.Evaluator.USE_BOX_PLOT = True¶: True to use boxplot, False to use violinplot.

Environment.Evaluator.random_shuffle = False¶: Use basic random events of shuffling the arms?

Environment.Evaluator.random_invert = False¶: Use basic random events of inverting the arms?

Environment.Evaluator.nb_break_points = 0¶: Default nb of random events

Environment.Evaluator.STORE_ALL_REWARDS = False¶: Store all rewards?

Environment.Evaluator.STORE_REWARDS_SQUARED = False¶: Store rewards squared?

Environment.Evaluator.MORE_ACCURATE = True¶: Use the count of selections instead of rewards for a more accurate mean/var reward measure.

Environment.Evaluator.FINAL_RANKS_ON_AVERAGE = True¶: Final ranks are printed based on average on last 1% rewards and not only the last rewards

Environment.Evaluator.USE_JOBLIB_FOR_POLICIES = False¶: Don’t use joblib to parallelize the simulations on various policies (we parallelize the random Monte Carlo repetitions)

class Environment.Evaluator.Evaluator(configuration, finalRanksOnAverage=True, averageOn=0.005, useJoblibForPolicies=False, moreAccurate=True)[source]¶

Bases: object

Evaluator class to run the simulations.

__init__(configuration, finalRanksOnAverage=True, averageOn=0.005, useJoblibForPolicies=False, moreAccurate=True)[source]¶: Initialize self. See help(type(self)) for accurate signature.

cfg = None¶: Configuration dictionnary

nbPolicies = None¶: Number of policies

horizon = None¶: Horizon (number of time steps)

repetitions = None¶: Number of repetitions

delta_t_plot = None¶: Sampling rate for plotting

random_shuffle = None¶: Random shuffling of arms?

random_invert = None¶: Random inversion of arms?

nb_break_points = None¶: How many random events?

plot_lowerbound = None¶: Should we plot the lower-bound?

moreAccurate = None¶: Use the count of selections instead of rewards for a more accurate mean/var reward measure.

finalRanksOnAverage = None¶: Final display of ranks are done on average rewards?

averageOn = None¶: How many last steps for final rank average rewards

useJoblibForPolicies = None¶: Use joblib to parallelize for loop on policies (useless)

useJoblib = None¶: Use joblib to parallelize for loop on repetitions (useful)

cache_rewards = None¶: Should we cache and precompute rewards

environment_bayesian = None¶: Is the environment Bayesian?

showplot = None¶: Show the plot (interactive display or not)

use_box_plot = None¶: To use box plot (or violin plot if False). Force to use boxplot if repetitions=1.

change_labels = None¶: Possibly empty dictionary to map ‘policyId’ to new labels (overwrite their name).

append_labels = None¶: Possibly empty dictionary to map ‘policyId’ to new labels (by appending the result from ‘append_labels’).

envs = None¶: List of environments

policies = None¶: List of policies

rewards = None¶: For each env, history of rewards, ie accumulated rewards

lastCumRewards = None¶: For each env, last accumulated rewards, to compute variance and histogram of whole regret R_T

minCumRewards = None¶: For each env, history of minimum of rewards, to compute amplitude (+- STD)

maxCumRewards = None¶: For each env, history of maximum of rewards, to compute amplitude (+- STD)

rewardsSquared = None¶: For each env, history of rewards squared

allRewards = None¶: For each env, full history of rewards

bestArmPulls = None¶: For each env, keep the history of best arm pulls

pulls = None¶: For each env, keep cumulative counts of all arm pulls

allPulls = None¶: For each env, keep cumulative counts of all arm pulls

lastPulls = None¶: For each env, keep cumulative counts of all arm pulls

runningTimes = None¶: For each env, keep the history of running times

memoryConsumption = None¶: For each env, keep the history of running times

numberOfCPDetections = None¶: For each env, store the number of change-point detections by each algorithms, to print it’s average at the end (to check if a certain Change-Point detector algorithm detects too few or too many changes).

__initEnvironments__()[source]¶: Create environments.

__initPolicies__(env)[source]¶: Create or initialize policies.

compute_cache_rewards(arms)[source]¶: Compute only once the rewards, then launch the experiments with the same matrix (r_{k,t}).

startAllEnv()[source]¶: Simulate all envs.

startOneEnv(envId, env)[source]¶: Simulate that env.

saveondisk(filepath='saveondisk_Evaluator.hdf5')[source]¶

Save the content of the internal data to into a HDF5 file on the disk.

See http://docs.h5py.org/en/stable/quick.html if needed.

getPulls(policyId, envId=0)[source]¶: Extract mean pulls.

getBestArmPulls(policyId, envId=0)[source]¶: Extract mean best arm pulls.

getRewards(policyId, envId=0)[source]¶: Extract mean rewards.

getAverageWeightedSelections(policyId, envId=0)[source]¶: Extract weighted count of selections.

getMaxRewards(envId=0)[source]¶: Extract max mean rewards.

getCumulatedRegret_LessAccurate(policyId, envId=0)[source]¶: Compute cumulative regret, based on accumulated rewards.

getCumulatedRegret_MoreAccurate(policyId, envId=0)[source]¶: Compute cumulative regret, based on counts of selections and not actual rewards.

getCumulatedRegret(policyId, envId=0, moreAccurate=None)[source]¶: Using either the more accurate or the less accurate regret count.

getLastRegrets_LessAccurate(policyId, envId=0)[source]¶: Extract last regrets, based on accumulated rewards.

getAllLastWeightedSelections(policyId, envId=0)[source]¶: Extract weighted count of selections.

getLastRegrets_MoreAccurate(policyId, envId=0)[source]¶: Extract last regrets, based on counts of selections and not actual rewards.

getLastRegrets(policyId, envId=0, moreAccurate=None)[source]¶: Using either the more accurate or the less accurate regret count.

getAverageRewards(policyId, envId=0)[source]¶: Extract mean rewards (not rewards but cumsum(rewards)/cumsum(1).

getRewardsSquared(policyId, envId=0)[source]¶: Extract rewards squared.

getSTDRegret(policyId, envId=0, meanReward=False)[source]¶: Extract standard deviation of rewards.

Warning

FIXME experimental!

getMaxMinReward(policyId, envId=0)[source]¶: Extract amplitude of rewards as maxCumRewards - minCumRewards.

getRunningTimes(envId=0)[source]¶: Get the means and stds and list of running time of the different policies.

getMemoryConsumption(envId=0)[source]¶: Get the means and stds and list of memory consumptions of the different policies.

getNumberOfCPDetections(envId=0)[source]¶: Get the means and stds and list of numberOfCPDetections of the different policies.

printFinalRanking(envId=0, moreAccurate=None)[source]¶: Print the final ranking of the different policies.

_xlabel(envId, *args, **kwargs)[source]¶: Add xlabel to the plot, and if the environment has change-point, draw vertical lines to clearly identify the locations of the change points.

plotRegrets(envId=0, savefig=None, meanReward=False, plotSTD=False, plotMaxMin=False, semilogx=False, semilogy=False, loglog=False, normalizedRegret=False, drawUpperBound=False, moreAccurate=None)[source]¶: Plot the centralized cumulated regret, support more than one environments (use evaluators to give a list of other environments).

plotBestArmPulls(envId, savefig=None)[source]¶

Plot the frequency of pulls of the best channel.

Warning: does not adapt to dynamic settings!

printRunningTimes(envId=0, precision=3)[source]¶: Print the average+-std running time of the different policies.

plotRunningTimes(envId=0, savefig=None, base=1, unit='seconds')[source]¶: Plot the running times of the different policies, as a box plot for each.

printMemoryConsumption(envId=0)[source]¶: Print the average+-std memory consumption of the different policies.

plotMemoryConsumption(envId=0, savefig=None, base=1024, unit='KiB')[source]¶: Plot the memory consumption of the different policies, as a box plot for each.

printNumberOfCPDetections(envId=0)[source]¶: Print the average+-std number_of_cp_detections of the different policies.

plotNumberOfCPDetections(envId=0, savefig=None)[source]¶: Plot the number of change-point detections of the different policies, as a box plot for each.

__dict__ = mappingproxy({'__module__': 'Environment.Evaluator', '__doc__': ' Evaluator class to run the simulations.', '__init__': <function Evaluator.__init__>, '__initEnvironments__': <function Evaluator.__initEnvironments__>, '__initPolicies__': <function Evaluator.__initPolicies__>, 'compute_cache_rewards': <function Evaluator.compute_cache_rewards>, 'startAllEnv': <function Evaluator.startAllEnv>, 'startOneEnv': <function Evaluator.startOneEnv>, 'saveondisk': <function Evaluator.saveondisk>, 'getPulls': <function Evaluator.getPulls>, 'getBestArmPulls': <function Evaluator.getBestArmPulls>, 'getRewards': <function Evaluator.getRewards>, 'getAverageWeightedSelections': <function Evaluator.getAverageWeightedSelections>, 'getMaxRewards': <function Evaluator.getMaxRewards>, 'getCumulatedRegret_LessAccurate': <function Evaluator.getCumulatedRegret_LessAccurate>, 'getCumulatedRegret_MoreAccurate': <function Evaluator.getCumulatedRegret_MoreAccurate>, 'getCumulatedRegret': <function Evaluator.getCumulatedRegret>, 'getLastRegrets_LessAccurate': <function Evaluator.getLastRegrets_LessAccurate>, 'getAllLastWeightedSelections': <function Evaluator.getAllLastWeightedSelections>, 'getLastRegrets_MoreAccurate': <function Evaluator.getLastRegrets_MoreAccurate>, 'getLastRegrets': <function Evaluator.getLastRegrets>, 'getAverageRewards': <function Evaluator.getAverageRewards>, 'getRewardsSquared': <function Evaluator.getRewardsSquared>, 'getSTDRegret': <function Evaluator.getSTDRegret>, 'getMaxMinReward': <function Evaluator.getMaxMinReward>, 'getRunningTimes': <function Evaluator.getRunningTimes>, 'getMemoryConsumption': <function Evaluator.getMemoryConsumption>, 'getNumberOfCPDetections': <function Evaluator.getNumberOfCPDetections>, 'printFinalRanking': <function Evaluator.printFinalRanking>, '_xlabel': <function Evaluator._xlabel>, 'plotRegrets': <function Evaluator.plotRegrets>, 'plotBestArmPulls': <function Evaluator.plotBestArmPulls>, 'printRunningTimes': <function Evaluator.printRunningTimes>, 'plotRunningTimes': <function Evaluator.plotRunningTimes>, 'printMemoryConsumption': <function Evaluator.printMemoryConsumption>, 'plotMemoryConsumption': <function Evaluator.plotMemoryConsumption>, 'printNumberOfCPDetections': <function Evaluator.printNumberOfCPDetections>, 'plotNumberOfCPDetections': <function Evaluator.plotNumberOfCPDetections>, 'printLastRegrets': <function Evaluator.printLastRegrets>, 'plotLastRegrets': <function Evaluator.plotLastRegrets>, 'plotHistoryOfMeans': <function Evaluator.plotHistoryOfMeans>, '__dict__': <attribute '__dict__' of 'Evaluator' objects>, '__weakref__': <attribute '__weakref__' of 'Evaluator' objects>})¶

__module__ = 'Environment.Evaluator'¶

__weakref__¶: list of weak references to the object (if defined)

printLastRegrets(envId=0, moreAccurate=False)[source]¶: Print the last regrets of the different policies.

plotLastRegrets(envId=0, normed=False, subplots=True, nbbins=15, log=False, all_on_separate_figures=False, sharex=False, sharey=False, boxplot=False, normalized_boxplot=True, savefig=None, moreAccurate=False)[source]¶: Plot histogram of the regrets R_T for all policies.

plotHistoryOfMeans(envId=0, horizon=None, savefig=None)[source]¶: Plot the history of means, as a plot with x axis being the time, y axis the mean rewards, and K curves one for each arm.

Environment.Evaluator.delayed_play(env, policy, horizon, random_shuffle=False, random_invert=False, nb_break_points=0, seed=None, allrewards=None, repeatId=0, useJoblib=False)[source]¶: Helper function for the parallelization.

Environment.Evaluator.EvaluatorFromDisk(filepath='/tmp/saveondiskEvaluator.hdf5')[source]¶: Create a new Evaluator object from the HDF5 file given in argument.

Environment.Evaluator.shuffled(mylist)[source]¶

Returns a shuffled version of the input 1D list. sorted() exists instead of list.sort(), but shuffled() does not exist instead of random.shuffle()…

>>> from random import seed; seed(1234)  # reproducible results
>>> mylist = [ 0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9]
>>> shuffled(mylist)
[0.9, 0.4, 0.3, 0.6, 0.5, 0.7, 0.1, 0.2, 0.8]
>>> shuffled(mylist)
[0.4, 0.3, 0.7, 0.5, 0.8, 0.1, 0.9, 0.6, 0.2]
>>> shuffled(mylist)
[0.4, 0.6, 0.9, 0.5, 0.7, 0.2, 0.1, 0.3, 0.8]
>>> shuffled(mylist)
[0.8, 0.7, 0.3, 0.1, 0.9, 0.5, 0.6, 0.2, 0.4]