Short documentation of the API¶

This short document aim at documenting the API used in my SMPyBandits environment, and closing this issue #3.

Code organization¶

Layout of the code:¶

Arms are defined in this folder (Arms/), see for example Arms.Bernoulli
MAB algorithms (also called policies) are defined in this folder (Policies/), see for example Policies.Dummy for a fully random policy, Policies.EpsilonGreedy for the epsilon-greedy random policy, Policies.UCB for the “simple” UCB algorithm, or also Policies.BayesUCB, Policies.klUCB for two UCB-like algorithms, Policies.AdBandits for the AdBandits algorithm, and Policies.Aggregator for my aggregated bandits algorithms.
Environments to encapsulate date are defined in this folder (Environment/): MAB problem use the class Environment.MAB, simulation results are stored in a Environment.Result, and the class to evaluate multi-policy single-player multi-env is Environment.Evaluator.
very_simple_configuration.py` imports all the classes, and define the simulation parameters as a dictionary (JSON-like).
main.py runs the simulations, then display the final ranking of the different policies and plots the results (saved to this folder (plots/)).

UML diagrams¶

For more details, see these UML diagrams.

Question: How to change the simulations?¶

To customize the plots¶

Change the default settings defined in Environment/plotsettings.py.

To change the configuration of the simulations¶

Change the config file, i.e., configuration.py for single-player simulations, or configuration_multiplayers.py for multi-players simulations.
A good example of a very simple configuration file is given in very_simple_configuration.py`

To change how to results are exploited¶

Change the main script, i.e., main.py for single-player simulations, main_multiplayers.py for multi-players simulations. Some plots can be disabled or enabled by commenting a few lines, and some options are given as flags (constants in the beginning of the file).
If needed, change, improve or add some methods to the simulation environment class, i.e., Environment.Evaluator for single-player simulations, and Environment.EvaluatorMultiPlayers for multi-players simulations. They use a class to store their simulation result, Environment.Result and Environment.ResultMultiPlayers.

Question: How to add something to this project?¶

In other words, what’s the API of this project?

For a new arm¶

Make a new file, e.g., MyArm.py
Save it in Arms/
The file should contain a class of the same name, inheriting from Arms/Arm, e.g., like this class MyArm(Arm): ... (no need for any super call)
This class MyArm has to have at least an __init__(...) method to create the arm object (with or without arguments - named or not); a __str__ method to print it as a string; a draw(t) method to draw a reward from this arm (t is the time, which can be used or not); and should have a mean() method that gives/computes the mean of the arm
Finally, add it to the Arms/__init__.py file: from .MyArm import MyArm

For examples, see Arms.Bernoulli, Arms.Gaussian, Arms.Exponential, Arms.Poisson.

For example, use this template:

from .Arm import Arm

class MyArm(Arm):
    def __init__(self, *args, **kwargs):
        # TODO Finish this method that initialize the arm MyArm

    def __str__(self):
        return "MyArm(...)".format('...')  # TODO

    def draw(self, t=None):
        # TODO Simulates a pull of this arm. t might be used, but not necessarily

    def mean(self):
        # TODO Returns the mean of this arm

For a new (single-user) policy¶

Make a new file, e.g., MyPolicy.py
Save it in Policies/
The file should contain a class of the same name, it can inherit from Policies/IndexPolicy if it is a simple index policy, e.g., like this, class MyPolicy(IndexPolicy): ... (no need for any super call), or simply like class MyPolicy(object): ...
This class MyPolicy has to have at least an __init__(nbArms, ...) method to create the policy object (with or without arguments - named or not), with at least the parameter nbArms (number of arms); a __str__ method to print it as a string; a choice() method to choose an arm (index among 0, ..., nbArms - 1, e.g., at random, or based on a maximum index if it is an index policy); and a getReward(arm, reward) method called when the arm arm gave the reward reward, and finally a startGame() method (possibly empty) which is called when a new simulation is ran.
Optionally, a policy class can have a handleCollision(arm) method to handle a collision after choosing the arm arm (eg. update an internal index, change a fixed offset etc).
Finally, add it to the Policies/__init__.py file: from .MyPolicy import MyPolicy

For examples, see Arms.Uniform for a fully randomized policy, Arms.EpsilonGreedy for a simple exploratory policy, Arms.Softmax for another simple approach, Arms.UCB for the class Upper Confidence-Bounds policy based on indexes, so inheriting from Policies/IndexPolicy). There is also Arms.Thompson and Arms.BayesUCB for Bayesian policies (using a posterior, e.g., like Arms.Beta), Arms.klUCB for a policy based on the Kullback-Leibler divergence.

For less classical Arms.AdBandit is an approach combining Bayesian and frequentist point of view, and Arms.Aggregator is my aggregating policy.

For example, use this template:

class MyPolicy(object):
    def __init__(self, nbArms, *args, **kwargs):
        self.nbArms = nbArms
        # TODO Finish this method that initialize the arm MyArm

    def __str__(self):
        return "MyArm(...)".format('...')  # TODO

    def startGame(self):
        pass  # Can be non-trivial, TODO if needed

    def getReward(self, arm, reward):
        # TODO After the arm 'arm' has been pulled, it gave the reward 'reward'
        pass  # Can be non-trivial, TODO if needed

    def choice(self):
        # TODO Do a smart choice of arm
        return random.randint(self.nbArms)

    def handleCollision(self, arm):
        pass  # Can be non-trivial, TODO if needed

Other choice...() methods can be added, if this policy MyPolicy has to be used for multiple play, ranked play, etc.

For a new multi-users policy¶

Make a new file, e.g., MyPoliciesMultiPlayers.py
Save it in PoliciesMultiPlayers/
The file should contain a class, of the same name, e.g., like this, class MyPoliciesMultiPlayers(object):
This class MyPoliciesMultiPlayers has to have at least an __init__ method to create the arm; a __str__ method to print it as a string; and a children attribute that gives a list of players (single-player policies).
Finally, add it to the PoliciesMultiPlayers/__init__.py file: from .MyPoliciesMultiPlayers import MyPoliciesMultiPlayers

For examples, see PoliciesMultiPlayers.OracleNotFair and PoliciesMultiPlayers.OracleFair for full-knowledge centralized policies (fair or not), PoliciesMultiPlayers.CentralizedFixed and PoliciesMultiPlayers.CentralizedCycling for non-full-knowledge centralized policies (fair or not). There is also the PoliciesMultiPlayers.Selfish decentralized policy, where all players runs in without any knowledge on the number of players, and no communication (decentralized).