# SMPyBandits¶

Open-Source Python package for Single- and Multi-Players multi-armed Bandits algorithms.

This repository contains the code of Lilian Besson’s numerical environment, written in Python (2 or 3), for numerical simulations on :slot_machine: single-player and multi-players Multi-Armed Bandits (MAB) algorithms.

## Quick presentation¶

It contains the most complete collection of single-player (classical) bandit algorithms on the Internet (over 65!), as well as implementation of all the state-of-the-art multi-player algorithms.

I follow very actively the latest publications related to Multi-Armed Bandits (MAB) research, and usually implement quite quickly the new algorithms (see for instance, Exp3++, CORRAL and SparseUCB were each introduced by articles (for Exp3++, for CORRAL, for SparseUCB) presented at COLT in July 2017, LearnExp comes from a NIPS 2017 paper, and kl-UCB++ from an ALT 2017 paper.). More recent examples are klUCBswitch from a paper from May 2018, and also MusicalChairNoSensing from a paper from August 2018.

With this numerical framework, simulations can run on a single CPU or a multi-core machine, and summary plots are automatically saved as high-quality PNG, PDF and EPS (ready for being used in research article). Making new simulations is very easy, one only needs to write a configuration script and basically no code! See these examples (files named configuration_*.py).

A complete Sphinx documentation for each algorithms and every piece of code (included constants in the configurations!) is available here: SMPyBandits.GitHub.io. (I will use ReadTheDocs for this project, but I won’t use any continuous integration, don’t even think of it!)

I (Lilian Besson) have started my PhD in October 2016, and this is a part of my on going research since December 2016.

I launched the documentation on March 2017, I wrote my first research articles using this framework in 2017 and decided to (finally) open-source my project in February 2018. Commits of https://github.com/SMPyBandits/SMPyBandits/ / Date of last commit of https://github.com/SMPyBandits/SMPyBandits/ Issues of https://github.com/SMPyBandits/SMPyBandits/ : Open issues of https://github.com/SMPyBandits/SMPyBandits/ / Closed issues of https://github.com/SMPyBandits/SMPyBandits/

## How to cite this work?¶

If you use this package for your own work, please consider citing it with this piece of BibTeX:

@misc{SMPyBandits,
title =   {{SMPyBandits: an Open-Source Research Framework for Single and Multi-Players Multi-Arms Bandits (MAB) Algorithms in Python}},
author =  {Lilian Besson},
year =    {2018},
url =     {https://github.com/SMPyBandits/SMPyBandits/},
howpublished = {Online at: \url{github.com/SMPyBandits/SMPyBandits}},
note =    {Code at https://github.com/SMPyBandits/SMPyBandits/, documentation at https://smpybandits.github.io/}
}


I also wrote a small paper to present SMPyBandits, and I will send it to JMLR MLOSS. The paper can be consulted here on my website.

A DOI will arrive as soon as possible! I tried to publish a paper on both JOSS and MLOSS.

## List of research publications using SMPyBandits¶

### 1st article, about policy aggregation algorithm (aka model selection)¶

I designed and added the Aggregator policy, in order to test its validity and performance.

It is a “simple” voting algorithm to combine multiple bandit algorithms into one. Basically, it behaves like a simple MAB bandit just based on empirical means (even simpler than UCB), where arms are the child algorithms A_1 .. A_N, each running in “parallel”.

For more details, refer to this file: Aggregation.md and this research article.

### 2nd article, about Multi-players Multi-Armed Bandits¶

There is another point of view: instead of comparing different single-player policies on the same problem, we can make them play against each other, in a multi-player setting. The basic difference is about collisions : at each time t, if two or more user chose to sense the same channel, there is a collision. Collisions can be handled in different way from the base station point of view, and from each player point of view.

For more details, refer to this file: MultiPlayers.md and this research article.

### 3rd article, using Doubling Trick for Multi-Armed Bandits¶

I studied what Doubling Trick can and can’t do to obtain efficient anytime version of non-anytime optimal Multi-Armed Bandits algorithms.

For more details, refer to this file: DoublingTrick.md and this research article.

### 4th article, about Piece-Wise Stationary Multi-Armed Bandits¶

With Emilie Kaufmann, we studied the Generalized Likelihood Ratio Test (GLRT) for sub-Bernoulli distributions, and proposed the B-GLRT algorithm for change-point detection for piece-wise stationary one-armed bandit problems. We combined the B-GLRT with the kl-UCB multi-armed bandit algorithm and proposed the GLR-klUCB algorithm for piece-wise stationary multi-armed bandit problems. We prove finite-time guarantees for the B-GLRT and the GLR-klUCB algorithm, and we illustrate its performance with extensive numerical experiments.

For more details, refer to this file: NonStationaryBandits.md and this research article.

## Other remarks¶

• Everything here is done in an imperative, object oriented style. The API of the Arms, Policy and MultiPlayersPolicy classes is documented in this file (API.md).
• The code is clean, valid for both Python 2 and Python 3.
• Some piece of code come from the pymaBandits project, but most of them were refactored. Thanks to the initial project!
• G.Varoquaux’s joblib is used for the Evaluator and EvaluatorMultiPlayers classes, so the simulations are easily parallelized on multi-core machines. (Put n_jobs = -1 or PARALLEL = True in the config file to use all your CPU cores, as it is by default).

## How to run the experiments ?¶

See this document: How_to_run_the_code.md for more details (or this documentation page).

TL;DR: this short bash snippet shows how to clone the code, install the requirements for Python 3 (in a virtualenv, and starts some simulation for N=100 repetitions of the default non-Bayesian Bernoulli-distributed problem, for K=9 arms, an horizon of T=10000 and on 4 CPUs (it should take about 20 minutes for each simulations):

cd /tmp/  # or wherever you want
cd SMPyBandits
# just be sure you have the latest virtualenv from Python 3
sudo pip3 install --upgrade --force-reinstall virtualenv
# create and active the virtualenv
virtualenv venv
. venv/bin/activate
type pip  # check it is /tmp/SMPyBandits/venv/bin/pip
type python  # check it is /tmp/SMPyBandits/venv/bin/python
# install the requirements in the virtualenv
pip install -r requirements_full.txt
# run a single-player simulation!
N=100 T=10000 K=9 N_JOBS=4 make single
# run a multi-player simulation!
N=100 T=10000 M=3 K=9 N_JOBS=4 make moremulti


You can also install it directly with pip and from GitHub:

cd /tmp/ ; mkdir SMPyBandits ; cd SMPyBandits/
virtualenv venv
. venv/bin/activate
type pip  # check it is /tmp/SMPyBandits/venv/bin/pip
type python  # check it is /tmp/SMPyBandits/venv/bin/python
pip install git+https://github.com/SMPyBandits/SMPyBandits.git#egg=SMPyBandits[full]

• If speed matters to you and you want to use algorithms based on kl-UCB, you should take the time to build and install the fast C implementation of the utilities KL functions. Default is to use kullback.py, but using the C version from Policies/C/ really speeds up the computations. Just follow the instructions, it should work well (you need gcc to be installed).
• And if speed matters, be sure that you have a working version of Numba, it is used by many small functions to (try to automatically) speed up the computations.

### Nix¶

A pinned Nix environment is available for this experimental setup in the nix/pkgs/ directory. From the root of the project:

$nix-shell nix-shell$ jupyter_notebook
nix-shell$N=100 T=10000 K=9 N_JOBS=4 make single  The following one-liner lets you explore one of the example notebooks from any Nix-enabled machine, without cloning the repository: $ nix-shell https://github.com/SMPYBandits/SMPyBandits/archive/master.tar.gz --run 'jupyter-notebook \$EXAMPLE_NOTEBOOKS/Example_of_a_small_Multi-Player_Simulation__with_Centralized_Algorithms.ipynb'


## Contributing?¶

I don’t except issues or pull requests on this project, but you are welcome to.

Contributions (issues, questions, pull requests) are of course welcome, but this project is and will stay a personal environment designed for quick research experiments, and will never try to be an industry-ready module for applications of Multi-Armed Bandits algorithms. If you want to contribute, please have a look to the CONTRIBUTING.md file, and if you want to be more seriously involved, read the CODE_OF_CONDUCT.md file.

## :boom: TODO¶

See this file TODO.md, and the issues on GitHub.