complete_tree_exploration_for_MP_bandits module¶
Experimental code to perform complete tree exploration for MultiPlayer bandits.
Algorithms:
 Support Selfish 0greedy, UCB, and klUCB in 3 different variants.
 Support also RhoRand, RandTopM and MCTopM, even though they are not memoryless, by using another state representation (inlining the memory of each player, eg the ranks for RhoRand).
Features:
 For the means of each arm, \(\mu_1, \dots, \mu_K\), this script can use exact formal computations with sympy, or fractions with Fraction, or float number.
 The graph can contain all nodes from root to leafs, or only leafs (with summed probabilities), and possibly only the absorbing nodes are showed.
 Support export of the tree to a GraphViz dot graph, and can save it to SVG/PNG and LaTeX (with Tikz) and PDF etc.
 By default, the root is highlighted in green and the absorbing nodes are in red.
Warning
I still have to fix these issues:
 TODO : right now, it is not so efficient, could it be improved? I don’t think I can do anything in a smarter way, in pure Python.
Requirements:
 ‘sympy’ module to use formal means \(\mu_1, \dots, \mu_K\) instead of numbers,
 ‘numpy’ module for computations on indexes (e.g.,
np.where
),  ‘graphviz’ module to generate the graph and save it,
 ‘dot2tex’ module to generate nice LaTeX (with Tikz) graph and save it to PDF.
Note
To use the ‘dot2tex’ module, only Python2 is supported. However, I maintain an unpublished port of ‘dot2tex’ for Python3, see [here](https://github.com/Naereen/dot2tex), that you can download, and install manually (sudo python3 setup.py install) to have ‘dot2tex’ for Python3 also.
About:
 Date: 16/09/2017.
 Author: Lilian Besson, (C) 2017
 Licence: MIT Licence (http://lbesson.mitlicense.org).

complete_tree_exploration_for_MP_bandits.
oo
= inf¶ Shortcut for float(‘+inf’).

complete_tree_exploration_for_MP_bandits.
PLOT_DIR
= 'plots/trees'¶ Directory for the plots

complete_tree_exploration_for_MP_bandits.
tupleit1
(anarray)[source]¶ Convert a nonhashable 1D numpy array to a hashable tuple.

complete_tree_exploration_for_MP_bandits.
tupleit2
(anarray)[source]¶ Convert a nonhashable 2D numpy array to a hashable tupleoftuples.

complete_tree_exploration_for_MP_bandits.
prod
(iterator)[source]¶ Product of the values in this iterator.

complete_tree_exploration_for_MP_bandits.
WIDTH
= 200¶ Default value for the
width
parameter forwraptext()
andwraplatex()
.

complete_tree_exploration_for_MP_bandits.
wraptext
(text, width=200)[source]¶ Wrap the text, using
textwrap
module, andwidth
.

complete_tree_exploration_for_MP_bandits.
ONLYLEAFS
= True¶ By default, aim at the most concise graph representation by only showing the leafs.

complete_tree_exploration_for_MP_bandits.
ONLYABSORBING
= False¶ By default, don’t aim at the most concise graph representation by only showing the absorbing leafs.

complete_tree_exploration_for_MP_bandits.
CONCISE
= True¶ By default, only show \(\tilde{S}\) and \(N\) in the graph representations, not all the 4 vectors.

complete_tree_exploration_for_MP_bandits.
FULLHASH
= False¶ Use only Stilde, N for hashing the states.

complete_tree_exploration_for_MP_bandits.
FORMAT
= 'svg'¶ Format used to save the graphs.

complete_tree_exploration_for_MP_bandits.
FixedArm
(j, state)[source]¶ Fake player j that always targets at arm j.

complete_tree_exploration_for_MP_bandits.
UniformExploration
(j, state)[source]¶ Fake player j that always targets all arms.

complete_tree_exploration_for_MP_bandits.
ConstantRank
(j, state, decision, collision)[source]¶ Constant rank no matter what.

complete_tree_exploration_for_MP_bandits.
choices_from_indexes
(indexes)[source]¶ For deterministic index policies, if more than one index is maximum, return the list of positions attaining this maximum (ties), or only one position.

complete_tree_exploration_for_MP_bandits.
Selfish_0Greedy_U
(j, state)[source]¶ Selfish policy + 0Greedy index + U feedback.

complete_tree_exploration_for_MP_bandits.
Selfish_0Greedy_Utilde
(j, state)[source]¶ Selfish policy + 0Greedy index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.
Selfish_0Greedy_Ubar
(j, state)[source]¶ Selfish policy + 0Greedy index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.
Selfish_UCB_U
(j, state)[source]¶ Selfish policy + UCB_0.5 index + U feedback.

complete_tree_exploration_for_MP_bandits.
Selfish_UCB
(j, state)[source]¶ Selfish policy + UCB_0.5 index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.
Selfish_UCB_Utilde
(j, state)¶ Selfish policy + UCB_0.5 index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.
Selfish_UCB_Ubar
(j, state)[source]¶ Selfish policy + UCB_0.5 index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.
Selfish_KLUCB_U
(j, state)[source]¶ Selfish policy + Bernoulli KLUCB index + U feedback.

complete_tree_exploration_for_MP_bandits.
Selfish_KLUCB
(j, state)[source]¶ Selfish policy + Bernoulli KLUCB index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.
Selfish_KLUCB_Utilde
(j, state)¶ Selfish policy + Bernoulli KLUCB index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.
Selfish_KLUCB_Ubar
(j, state)[source]¶ Selfish policy + Bernoulli KLUCB index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.
choices_from_indexes_with_rank
(indexes, rank=1)[source]¶ For deterministic index policies, if more than one index is maximum, return the list of positions attaining the rankth largest index (with more than one if ties, or only one position).

complete_tree_exploration_for_MP_bandits.
RhoRand_UCB_U
(j, state)[source]¶ RhoRand policy + UCB_0.5 index + U feedback.

complete_tree_exploration_for_MP_bandits.
RhoRand_UCB_Utilde
(j, state)[source]¶ RhoRand policy + UCB_0.5 index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.
RhoRand_UCB_Ubar
(j, state)[source]¶ RhoRand policy + UCB_0.5 index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.
RhoRand_KLUCB_U
(j, state)[source]¶ RhoRand policy + Bernoulli KLUCB index + U feedback.

complete_tree_exploration_for_MP_bandits.
RhoRand_KLUCB_Utilde
(j, state)[source]¶ RhoRand policy + Bernoulli KLUCB index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.
RhoRand_KLUCB_Ubar
(j, state)[source]¶ RhoRand policy + Bernoulli KLUCB index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.
RandomNewRank
(j, state, decision, collision)[source]¶ RhoRand chooses a new uniform rank in {1,..,M} in case of collision, or keep the same.

complete_tree_exploration_for_MP_bandits.
default_policy
(j, state)¶ RhoRand policy + UCB_0.5 index + U feedback.

complete_tree_exploration_for_MP_bandits.
default_update_memory
(j, state, decision, collision)¶ RhoRand chooses a new uniform rank in {1,..,M} in case of collision, or keep the same.

complete_tree_exploration_for_MP_bandits.
RandTopM_UCB_U
(j, state, collision=False)[source]¶ RandTopM policy + UCB_0.5 index + U feedback.

complete_tree_exploration_for_MP_bandits.
RandTopM_UCB_Utilde
(j, state, collision=False)[source]¶ RandTopM policy + UCB_0.5 index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.
RandTopM_UCB_Ubar
(j, state, collision=False)[source]¶ RandTopM policy + UCB_0.5 index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.
RandTopM_KLUCB_U
(j, state, collision=False)[source]¶ RandTopM policy + Bernoulli KLUCB index + U feedback.

complete_tree_exploration_for_MP_bandits.
RandTopM_KLUCB_Utilde
(j, state, collision=False)[source]¶ RandTopM policy + Bernoulli KLUCB index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.
RandTopM_KLUCB_Ubar
(j, state, collision=False)[source]¶ RandTopM policy + Bernoulli KLUCB index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.
RandTopM_RandomNewChosenArm
(j, state, decision, collision)[source]¶ RandTopM chooses a new arm after a collision or if the chosen arm lies outside of its estimatedBestArms set, uniformly from the set of estimated M best arms, or keep the same.

complete_tree_exploration_for_MP_bandits.
write_to_tuple
(this_tuple, index, value)[source]¶ Tuple cannot be written, this hack fixes that.

complete_tree_exploration_for_MP_bandits.
MCTopM_UCB_U
(j, state, collision=False)[source]¶ MCTopM policy + UCB_0.5 index + U feedback.

complete_tree_exploration_for_MP_bandits.
MCTopM_UCB_Utilde
(j, state, collision=False)[source]¶ MCTopM policy + UCB_0.5 index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.
MCTopM_UCB_Ubar
(j, state, collision=False)[source]¶ MCTopM policy + UCB_0.5 index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.
MCTopM_KLUCB_U
(j, state, collision=False)[source]¶ MCTopM policy + Bernoulli KLUCB index + U feedback.

complete_tree_exploration_for_MP_bandits.
MCTopM_KLUCB_Utilde
(j, state, collision=False)[source]¶ MCTopM policy + Bernoulli KLUCB index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.
MCTopM_KLUCB_Ubar
(j, state, collision=False)[source]¶ MCTopM policy + Bernoulli KLUCB index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.
MCTopM_RandomNewChosenArm
(j, state, decision, collision)[source]¶ RandTopMC chooses a new arm after if the chosen arm lies outside of its estimatedBestArms set, uniformly from the set of estimated M best arms, or keep the same.

complete_tree_exploration_for_MP_bandits.
symbol_means
(K)[source]¶ Better to work directly with symbols and instantiate the results after.

complete_tree_exploration_for_MP_bandits.
random_uniform_means
(K)[source]¶ If needed, generate an array of K (numerical) uniform means in [0, 1].

complete_tree_exploration_for_MP_bandits.
uniform_means
(nbArms=3, delta=0.1, lower=0.0, amplitude=1.0)[source]¶ Return a list of means of arms, well spaced:
 in [lower, lower + amplitude],
 sorted in increasing order,
 starting from lower + amplitude * delta, up to lower + amplitude * (1  delta),
 and there is nbArms arms.
>>> np.array(uniform_means(2, 0.1)) array([ 0.1, 0.9]) >>> np.array(uniform_means(3, 0.1)) array([ 0.1, 0.5, 0.9]) >>> np.array(uniform_means(9, 1 / (1. + 9))) array([ 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])

complete_tree_exploration_for_MP_bandits.
proba2float
(proba, values=None, K=None, names=None)[source]¶ Replace mu_k by a numerical value and evaluation the formula.

complete_tree_exploration_for_MP_bandits.
simplify
(proba)[source]¶ Try to simplify the expression of the probability.

complete_tree_exploration_for_MP_bandits.
proba2str
(proba, latex=False, html_in_var_names=False)[source]¶ Pretty print a proba, either a number, a Fraction, or a sympy expression.

complete_tree_exploration_for_MP_bandits.
tex2pdf
(filename)[source]¶ Naive call to command line pdflatex, twice.

class
complete_tree_exploration_for_MP_bandits.
State
(S, Stilde, N, Ntilde, mus, players, depth=0)[source]¶ Bases:
object
Not spaceefficient representation of a state in the system we model.
 S, Stilde, N, Ntilde: are arrays of size (M, K),
 depth, t, M, K: integers, to avoid recomputing them,
 mus: the problem parameters (only for Bernoulli arms),
 players: is a list of algorithms,
 probas: list of transition probabilities,
 children: list of all possible next states (transitions).

__init__
(S, Stilde, N, Ntilde, mus, players, depth=0)[source]¶ Create a new state. Arrays S, Stilde, N, Ntilde are copied to avoid modify previous values!

S
= None¶ sensing feedback

Stilde
= None¶ number of sensing trials

N
= None¶ number of succesful transmissions

Ntilde
= None¶ number of trials without collisions

depth
= None¶ current depth of the exploration tree

t
= None¶ current time step. Simply = sum(N[0]) = sum(N[i]) for all player i, but easier to compute it once and store it

M
= None¶ number of players

K
= None¶ number of arms (channels)

children
= None¶ list of next state, representing all the possible transitions

probas
= None¶ probabilities of transitions

to_dot
(title='', name='', comment='', latex=False, html_in_var_names=False, ext='svg', onlyleafs=True, onlyabsorbing=False, concise=True)[source]¶ Convert the state to a .dot graph, using GraphViz. See http://graphviz.readthedocs.io/ for more details.
 onlyleafs: only print the root and the leafs, to see a concise representation of the tree.
 onlyabsorbing: only print the absorbing leafs, to see a really concise representation of the tree.
 concise: weather to use the short representation of states (using \(\tilde{S}\) and \(N\)) or the long one (using the 4 variables).
 html_in_var_names: experimental use of
<SUB>..</SUB>
and<SUP>..</SUP>
in the label for the tree.  latex: experimental use of
_{..}
and^{..}
in the label for the tree, to use with dot2tex.

saveto
(filename, view=True, title='', name='', comment='', latex=False, html_in_var_names=False, ext='svg', onlyleafs=True, onlyabsorbing=False, concise=True)[source]¶

copy
()[source]¶ Get a new copy of that state with same S, Stilde, N, Ntilde but no probas and no children (and depth=0).

is_absorbing
()[source]¶ Try to detect if this state is absorbing, ie only one transition is possible, and again infinitely for the only child.
Warning
Still very experimental!

has_absorbing_child_whole_subtree
()[source]¶ Try to detect if this state has an absorbing child in the whole subtree.

explore_from_node_to_depth
(depth=1)[source]¶ Compute recursively the one_depth children of the root and its children.

compute_one_depth
()[source]¶ Use all_deltas to store all the possible transitions and their probabilities. Increase depth by 1 at the end.

all_absorbing_states
(depth=1)[source]¶ Generator that yields all the absorbing nodes of the tree, one by one.
 It might not find any,
 It does so without merging common nodes, in order to find the first absorbing node as quick as possible.

absorbing_states_one_depth
()[source]¶ Use all_deltas to yield all the absorbing onedepth child and their probabilities.

find_N_absorbing_states
(N=1, maxdepth=8)[source]¶ Find at least N absorbing states, by considering a large depth.

all_deltas
()[source]¶ Generator that yields functions transforming state to another state.
 It is memory efficient as it is a generator.
 Do not convert that to a list or it might use all your system memory: each returned value is a function with code and variables inside!

get_all_leafs
()[source]¶ Recurse and get all the leafs. Many different state can be present in the list of leafs, with possibly different probabilities (each correspond to a trajectory).

get_unique_leafs
()[source]¶ Compute all the leafs (deepest children) and merge the common one to compute their full probabilities.

proba_reaching_absorbing_state
()[source]¶ Compute the probability of reaching a leaf that is an absorbing state.

__dict__
= mappingproxy({'__module__': 'complete_tree_exploration_for_MP_bandits', '__doc__': 'Not spaceefficient representation of a state in the system we model.\n\n  S, Stilde, N, Ntilde: are arrays of size (M, K),\n  depth, t, M, K: integers, to avoid recomputing them,\n  mus: the problem parameters (only for Bernoulli arms),\n  players: is a list of algorithms,\n  probas: list of transition probabilities,\n  children: list of all possible next states (transitions).\n ', '__init__': <function State.__init__>, '__str__': <function State.__str__>, 'to_node': <function State.to_node>, 'to_dot': <function State.to_dot>, 'saveto': <function State.saveto>, 'copy': <function State.copy>, '__hash__': <function State.__hash__>, 'is_absorbing': <function State.is_absorbing>, 'has_absorbing_child_whole_subtree': <function State.has_absorbing_child_whole_subtree>, 'explore_from_node_to_depth': <function State.explore_from_node_to_depth>, 'compute_one_depth': <function State.compute_one_depth>, 'all_absorbing_states': <function State.all_absorbing_states>, 'absorbing_states_one_depth': <function State.absorbing_states_one_depth>, 'find_N_absorbing_states': <function State.find_N_absorbing_states>, 'all_deltas': <function State.all_deltas>, 'pretty_print_result_recursively': <function State.pretty_print_result_recursively>, 'get_all_leafs': <function State.get_all_leafs>, 'get_unique_leafs': <function State.get_unique_leafs>, 'proba_reaching_absorbing_state': <function State.proba_reaching_absorbing_state>, '__dict__': <attribute '__dict__' of 'State' objects>, '__weakref__': <attribute '__weakref__' of 'State' objects>})¶

__module__
= 'complete_tree_exploration_for_MP_bandits'¶

__weakref__
¶ list of weak references to the object (if defined)

class
complete_tree_exploration_for_MP_bandits.
StateWithMemory
(S, Stilde, N, Ntilde, mus, players, update_memories, memories=None, depth=0)[source]¶ Bases:
complete_tree_exploration_for_MP_bandits.State
State with a memory for each player, to represent and play with RhoRand etc.

__init__
(S, Stilde, N, Ntilde, mus, players, update_memories, memories=None, depth=0)[source]¶ Create a new state. Arrays S, Stilde, N, Ntilde are copied to avoid modify previous values!

memories
= None¶ Personal memory for all players, can be a rank in {1,..,M} for rhoRand, or anything else.

copy
()[source]¶ Get a new copy of that state with same S, Stilde, N, Ntilde but no probas and no children (and depth=0).

__hash__
(full=False)[source]¶ Hash the matrix Stilde and N of the state and memories of the players (ie. ranks for RhoRand).

is_absorbing
()[source]¶ Try to detect if this state is absorbing, ie only one transition is possible, and again infinitely for the only child.
Warning
Still very experimental!

all_deltas
()[source]¶ Generator that yields functions transforming state to another state.
 It is memory efficient as it is a generator.
 Do not convert that to a list or it might use all your system memory: each returned value is a function with code and variables inside!

__module__
= 'complete_tree_exploration_for_MP_bandits'¶
