Short documentation of the API

This short document aim at documenting the API used in my SMPyBandits environment, and closing this issue #3.

Code organization

Layout of the code:


UML diagrams

For more details, see these UML diagrams.

Question: How to change the simulations?

To customize the plots

  1. Change the default settings defined in Environment/plotsettings.py.

To change the configuration of the simulations

  1. Change the config file, i.e., configuration.py for single-player simulations, or configuration_multiplayers.py for multi-players simulations.
  2. A good example of a very simple configuration file is given in very_simple_configuration.py`

To change how to results are exploited

  1. Change the main script, i.e., main.py for single-player simulations, main_multiplayers.py for multi-players simulations. Some plots can be disabled or enabled by commenting a few lines, and some options are given as flags (constants in the beginning of the file).
  2. If needed, change, improve or add some methods to the simulation environment class, i.e., Environment.Evaluator for single-player simulations, and Environment.EvaluatorMultiPlayers for multi-players simulations. They use a class to store their simulation result, Environment.Result and Environment.ResultMultiPlayers.

Question: How to add something to this project?

In other words, what’s the API of this project?

For a new arm

  1. Make a new file, e.g., MyArm.py
  2. Save it in Arms/
  3. The file should contain a class of the same name, inheriting from Arms/Arm, e.g., like this class MyArm(Arm): ... (no need for any super call)
  4. This class MyArm has to have at least an __init__(...) method to create the arm object (with or without arguments - named or not); a __str__ method to print it as a string; a draw(t) method to draw a reward from this arm (t is the time, which can be used or not); and should have a mean() method that gives/computes the mean of the arm
  5. Finally, add it to the Arms/__init__.py file: from .MyArm import MyArm
  • For example, use this template:
from .Arm import Arm

class MyArm(Arm):
    def __init__(self, *args, **kwargs):
        # TODO Finish this method that initialize the arm MyArm

    def __str__(self):
        return "MyArm(...)".format('...')  # TODO

    def draw(self, t=None):
        # TODO Simulates a pull of this arm. t might be used, but not necessarily

    def mean(self):
        # TODO Returns the mean of this arm

For a new (single-user) policy

  1. Make a new file, e.g., MyPolicy.py
  2. Save it in Policies/
  3. The file should contain a class of the same name, it can inherit from Policies/IndexPolicy if it is a simple index policy, e.g., like this, class MyPolicy(IndexPolicy): ... (no need for any super call), or simply like class MyPolicy(object): ...
  4. This class MyPolicy has to have at least an __init__(nbArms, ...) method to create the policy object (with or without arguments - named or not), with at least the parameter nbArms (number of arms); a __str__ method to print it as a string; a choice() method to choose an arm (index among 0, ..., nbArms - 1, e.g., at random, or based on a maximum index if it is an index policy); and a getReward(arm, reward) method called when the arm arm gave the reward reward, and finally a startGame() method (possibly empty) which is called when a new simulation is ran.
  5. Optionally, a policy class can have a handleCollision(arm) method to handle a collision after choosing the arm arm (eg. update an internal index, change a fixed offset etc).
  6. Finally, add it to the Policies/__init__.py file: from .MyPolicy import MyPolicy
  • For example, use this template:
class MyPolicy(object):
    def __init__(self, nbArms, *args, **kwargs):
        self.nbArms = nbArms
        # TODO Finish this method that initialize the arm MyArm

    def __str__(self):
        return "MyArm(...)".format('...')  # TODO

    def startGame(self):
        pass  # Can be non-trivial, TODO if needed

    def getReward(self, arm, reward):
        # TODO After the arm 'arm' has been pulled, it gave the reward 'reward'
        pass  # Can be non-trivial, TODO if needed

    def choice(self):
        # TODO Do a smart choice of arm
        return random.randint(self.nbArms)

    def handleCollision(self, arm):
        pass  # Can be non-trivial, TODO if needed
Other choice...() methods can be added, if this policy MyPolicy has to be used for multiple play, ranked play, etc.

For a new multi-users policy

  1. Make a new file, e.g., MyPoliciesMultiPlayers.py
  2. Save it in PoliciesMultiPlayers/
  3. The file should contain a class, of the same name, e.g., like this, class MyPoliciesMultiPlayers(object):
  4. This class MyPoliciesMultiPlayers has to have at least an __init__ method to create the arm; a __str__ method to print it as a string; and a children attribute that gives a list of players (single-player policies).
  5. Finally, add it to the PoliciesMultiPlayers/__init__.py file: from .MyPoliciesMultiPlayers import MyPoliciesMultiPlayers
For examples, see PoliciesMultiPlayers.OracleNotFair and PoliciesMultiPlayers.OracleFair for full-knowledge centralized policies (fair or not), PoliciesMultiPlayers.CentralizedFixed and PoliciesMultiPlayers.CentralizedCycling for non-full-knowledge centralized policies (fair or not). There is also the PoliciesMultiPlayers.Selfish decentralized policy, where all players runs in without any knowledge on the number of players, and no communication (decentralized).
PoliciesMultiPlayers.Selfish is the simplest possible example I could give as a template.