complete_tree_exploration_for_MP_bandits module¶
Experimental code to perform complete tree exploration for Multi-Player bandits.
Algorithms:
- Support Selfish 0-greedy, UCB, and klUCB in 3 different variants.
- Support also RhoRand, RandTopM and MCTopM, even though they are not memory-less, by using another state representation (inlining the memory of each player, eg the ranks for RhoRand).
Features:
- For the means of each arm, \(\mu_1, \dots, \mu_K\), this script can use exact formal computations with sympy, or fractions with Fraction, or float number.
- The graph can contain all nodes from root to leafs, or only leafs (with summed probabilities), and possibly only the absorbing nodes are showed.
- Support export of the tree to a GraphViz dot graph, and can save it to SVG/PNG and LaTeX (with Tikz) and PDF etc.
- By default, the root is highlighted in green and the absorbing nodes are in red.
Warning
I still have to fix these issues:
- TODO : right now, it is not so efficient, could it be improved? I don’t think I can do anything in a smarter way, in pure Python.
Requirements:
- ‘sympy’ module to use formal means \(\mu_1, \dots, \mu_K\) instead of numbers,
- ‘numpy’ module for computations on indexes (e.g.,
np.where
), - ‘graphviz’ module to generate the graph and save it,
- ‘dot2tex’ module to generate nice LaTeX (with Tikz) graph and save it to PDF.
Note
To use the ‘dot2tex’ module, only Python2 is supported. However, I maintain an unpublished port of ‘dot2tex’ for Python3, see [here](https://github.com/Naereen/dot2tex), that you can download, and install manually (sudo python3 setup.py install) to have ‘dot2tex’ for Python3 also.
About:
- Date: 16/09/2017.
- Author: Lilian Besson, (C) 2017
- Licence: MIT Licence (http://lbesson.mit-license.org).
-
complete_tree_exploration_for_MP_bandits.
oo
= inf¶ Shortcut for float(‘+inf’).
-
complete_tree_exploration_for_MP_bandits.
PLOT_DIR
= 'plots/trees'¶ Directory for the plots
-
complete_tree_exploration_for_MP_bandits.
tupleit1
(anarray)[source]¶ Convert a non-hashable 1D numpy array to a hashable tuple.
-
complete_tree_exploration_for_MP_bandits.
tupleit2
(anarray)[source]¶ Convert a non-hashable 2D numpy array to a hashable tuple-of-tuples.
-
complete_tree_exploration_for_MP_bandits.
prod
(iterator)[source]¶ Product of the values in this iterator.
-
complete_tree_exploration_for_MP_bandits.
WIDTH
= 200¶ Default value for the
width
parameter forwraptext()
andwraplatex()
.
-
complete_tree_exploration_for_MP_bandits.
wraptext
(text, width=200)[source]¶ Wrap the text, using
textwrap
module, andwidth
.
-
complete_tree_exploration_for_MP_bandits.
ONLYLEAFS
= True¶ By default, aim at the most concise graph representation by only showing the leafs.
-
complete_tree_exploration_for_MP_bandits.
ONLYABSORBING
= False¶ By default, don’t aim at the most concise graph representation by only showing the absorbing leafs.
-
complete_tree_exploration_for_MP_bandits.
CONCISE
= True¶ By default, only show \(\tilde{S}\) and \(N\) in the graph representations, not all the 4 vectors.
-
complete_tree_exploration_for_MP_bandits.
FULLHASH
= False¶ Use only Stilde, N for hashing the states.
-
complete_tree_exploration_for_MP_bandits.
FORMAT
= 'svg'¶ Format used to save the graphs.
-
complete_tree_exploration_for_MP_bandits.
FixedArm
(j, state)[source]¶ Fake player j that always targets at arm j.
-
complete_tree_exploration_for_MP_bandits.
UniformExploration
(j, state)[source]¶ Fake player j that always targets all arms.
-
complete_tree_exploration_for_MP_bandits.
ConstantRank
(j, state, decision, collision)[source]¶ Constant rank no matter what.
-
complete_tree_exploration_for_MP_bandits.
choices_from_indexes
(indexes)[source]¶ For deterministic index policies, if more than one index is maximum, return the list of positions attaining this maximum (ties), or only one position.
-
complete_tree_exploration_for_MP_bandits.
Selfish_0Greedy_U
(j, state)[source]¶ Selfish policy + 0-Greedy index + U feedback.
-
complete_tree_exploration_for_MP_bandits.
Selfish_0Greedy_Utilde
(j, state)[source]¶ Selfish policy + 0-Greedy index + Utilde feedback.
-
complete_tree_exploration_for_MP_bandits.
Selfish_0Greedy_Ubar
(j, state)[source]¶ Selfish policy + 0-Greedy index + Ubar feedback.
-
complete_tree_exploration_for_MP_bandits.
Selfish_UCB_U
(j, state)[source]¶ Selfish policy + UCB_0.5 index + U feedback.
-
complete_tree_exploration_for_MP_bandits.
Selfish_UCB
(j, state)[source]¶ Selfish policy + UCB_0.5 index + Utilde feedback.
-
complete_tree_exploration_for_MP_bandits.
Selfish_UCB_Utilde
(j, state)¶ Selfish policy + UCB_0.5 index + Utilde feedback.
-
complete_tree_exploration_for_MP_bandits.
Selfish_UCB_Ubar
(j, state)[source]¶ Selfish policy + UCB_0.5 index + Ubar feedback.
-
complete_tree_exploration_for_MP_bandits.
Selfish_KLUCB_U
(j, state)[source]¶ Selfish policy + Bernoulli KL-UCB index + U feedback.
-
complete_tree_exploration_for_MP_bandits.
Selfish_KLUCB
(j, state)[source]¶ Selfish policy + Bernoulli KL-UCB index + Utilde feedback.
-
complete_tree_exploration_for_MP_bandits.
Selfish_KLUCB_Utilde
(j, state)¶ Selfish policy + Bernoulli KL-UCB index + Utilde feedback.
-
complete_tree_exploration_for_MP_bandits.
Selfish_KLUCB_Ubar
(j, state)[source]¶ Selfish policy + Bernoulli KL-UCB index + Ubar feedback.
-
complete_tree_exploration_for_MP_bandits.
choices_from_indexes_with_rank
(indexes, rank=1)[source]¶ For deterministic index policies, if more than one index is maximum, return the list of positions attaining the rank-th largest index (with more than one if ties, or only one position).
-
complete_tree_exploration_for_MP_bandits.
RhoRand_UCB_U
(j, state)[source]¶ RhoRand policy + UCB_0.5 index + U feedback.
-
complete_tree_exploration_for_MP_bandits.
RhoRand_UCB_Utilde
(j, state)[source]¶ RhoRand policy + UCB_0.5 index + Utilde feedback.
-
complete_tree_exploration_for_MP_bandits.
RhoRand_UCB_Ubar
(j, state)[source]¶ RhoRand policy + UCB_0.5 index + Ubar feedback.
-
complete_tree_exploration_for_MP_bandits.
RhoRand_KLUCB_U
(j, state)[source]¶ RhoRand policy + Bernoulli KL-UCB index + U feedback.
-
complete_tree_exploration_for_MP_bandits.
RhoRand_KLUCB_Utilde
(j, state)[source]¶ RhoRand policy + Bernoulli KL-UCB index + Utilde feedback.
-
complete_tree_exploration_for_MP_bandits.
RhoRand_KLUCB_Ubar
(j, state)[source]¶ RhoRand policy + Bernoulli KL-UCB index + Ubar feedback.
-
complete_tree_exploration_for_MP_bandits.
RandomNewRank
(j, state, decision, collision)[source]¶ RhoRand chooses a new uniform rank in {1,..,M} in case of collision, or keep the same.
-
complete_tree_exploration_for_MP_bandits.
default_policy
(j, state)¶ RhoRand policy + UCB_0.5 index + U feedback.
-
complete_tree_exploration_for_MP_bandits.
default_update_memory
(j, state, decision, collision)¶ RhoRand chooses a new uniform rank in {1,..,M} in case of collision, or keep the same.
-
complete_tree_exploration_for_MP_bandits.
RandTopM_UCB_U
(j, state, collision=False)[source]¶ RandTopM policy + UCB_0.5 index + U feedback.
-
complete_tree_exploration_for_MP_bandits.
RandTopM_UCB_Utilde
(j, state, collision=False)[source]¶ RandTopM policy + UCB_0.5 index + Utilde feedback.
-
complete_tree_exploration_for_MP_bandits.
RandTopM_UCB_Ubar
(j, state, collision=False)[source]¶ RandTopM policy + UCB_0.5 index + Ubar feedback.
-
complete_tree_exploration_for_MP_bandits.
RandTopM_KLUCB_U
(j, state, collision=False)[source]¶ RandTopM policy + Bernoulli KL-UCB index + U feedback.
-
complete_tree_exploration_for_MP_bandits.
RandTopM_KLUCB_Utilde
(j, state, collision=False)[source]¶ RandTopM policy + Bernoulli KL-UCB index + Utilde feedback.
-
complete_tree_exploration_for_MP_bandits.
RandTopM_KLUCB_Ubar
(j, state, collision=False)[source]¶ RandTopM policy + Bernoulli KL-UCB index + Ubar feedback.
-
complete_tree_exploration_for_MP_bandits.
RandTopM_RandomNewChosenArm
(j, state, decision, collision)[source]¶ RandTopM chooses a new arm after a collision or if the chosen arm lies outside of its estimatedBestArms set, uniformly from the set of estimated M best arms, or keep the same.
-
complete_tree_exploration_for_MP_bandits.
write_to_tuple
(this_tuple, index, value)[source]¶ Tuple cannot be written, this hack fixes that.
-
complete_tree_exploration_for_MP_bandits.
MCTopM_UCB_U
(j, state, collision=False)[source]¶ MCTopM policy + UCB_0.5 index + U feedback.
-
complete_tree_exploration_for_MP_bandits.
MCTopM_UCB_Utilde
(j, state, collision=False)[source]¶ MCTopM policy + UCB_0.5 index + Utilde feedback.
-
complete_tree_exploration_for_MP_bandits.
MCTopM_UCB_Ubar
(j, state, collision=False)[source]¶ MCTopM policy + UCB_0.5 index + Ubar feedback.
-
complete_tree_exploration_for_MP_bandits.
MCTopM_KLUCB_U
(j, state, collision=False)[source]¶ MCTopM policy + Bernoulli KL-UCB index + U feedback.
-
complete_tree_exploration_for_MP_bandits.
MCTopM_KLUCB_Utilde
(j, state, collision=False)[source]¶ MCTopM policy + Bernoulli KL-UCB index + Utilde feedback.
-
complete_tree_exploration_for_MP_bandits.
MCTopM_KLUCB_Ubar
(j, state, collision=False)[source]¶ MCTopM policy + Bernoulli KL-UCB index + Ubar feedback.
-
complete_tree_exploration_for_MP_bandits.
MCTopM_RandomNewChosenArm
(j, state, decision, collision)[source]¶ RandTopMC chooses a new arm after if the chosen arm lies outside of its estimatedBestArms set, uniformly from the set of estimated M best arms, or keep the same.
-
complete_tree_exploration_for_MP_bandits.
symbol_means
(K)[source]¶ Better to work directly with symbols and instantiate the results after.
-
complete_tree_exploration_for_MP_bandits.
random_uniform_means
(K)[source]¶ If needed, generate an array of K (numerical) uniform means in [0, 1].
-
complete_tree_exploration_for_MP_bandits.
uniform_means
(nbArms=3, delta=0.1, lower=0.0, amplitude=1.0)[source]¶ Return a list of means of arms, well spaced:
- in [lower, lower + amplitude],
- sorted in increasing order,
- starting from lower + amplitude * delta, up to lower + amplitude * (1 - delta),
- and there is nbArms arms.
>>> np.array(uniform_means(2, 0.1)) array([ 0.1, 0.9]) >>> np.array(uniform_means(3, 0.1)) array([ 0.1, 0.5, 0.9]) >>> np.array(uniform_means(9, 1 / (1. + 9))) array([ 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
-
complete_tree_exploration_for_MP_bandits.
proba2float
(proba, values=None, K=None, names=None)[source]¶ Replace mu_k by a numerical value and evaluation the formula.
-
complete_tree_exploration_for_MP_bandits.
simplify
(proba)[source]¶ Try to simplify the expression of the probability.
-
complete_tree_exploration_for_MP_bandits.
proba2str
(proba, latex=False, html_in_var_names=False)[source]¶ Pretty print a proba, either a number, a Fraction, or a sympy expression.
-
complete_tree_exploration_for_MP_bandits.
tex2pdf
(filename)[source]¶ Naive call to command line pdflatex, twice.
-
class
complete_tree_exploration_for_MP_bandits.
State
(S, Stilde, N, Ntilde, mus, players, depth=0)[source]¶ Bases:
object
Not space-efficient representation of a state in the system we model.
- S, Stilde, N, Ntilde: are arrays of size (M, K),
- depth, t, M, K: integers, to avoid recomputing them,
- mus: the problem parameters (only for Bernoulli arms),
- players: is a list of algorithms,
- probas: list of transition probabilities,
- children: list of all possible next states (transitions).
-
__init__
(S, Stilde, N, Ntilde, mus, players, depth=0)[source]¶ Create a new state. Arrays S, Stilde, N, Ntilde are copied to avoid modify previous values!
-
S
= None¶ sensing feedback
-
Stilde
= None¶ number of sensing trials
-
N
= None¶ number of succesful transmissions
-
Ntilde
= None¶ number of trials without collisions
-
depth
= None¶ current depth of the exploration tree
-
t
= None¶ current time step. Simply = sum(N[0]) = sum(N[i]) for all player i, but easier to compute it once and store it
-
M
= None¶ number of players
-
K
= None¶ number of arms (channels)
-
children
= None¶ list of next state, representing all the possible transitions
-
probas
= None¶ probabilities of transitions
-
to_dot
(title='', name='', comment='', latex=False, html_in_var_names=False, ext='svg', onlyleafs=True, onlyabsorbing=False, concise=True)[source]¶ Convert the state to a .dot graph, using GraphViz. See http://graphviz.readthedocs.io/ for more details.
- onlyleafs: only print the root and the leafs, to see a concise representation of the tree.
- onlyabsorbing: only print the absorbing leafs, to see a really concise representation of the tree.
- concise: weather to use the short representation of states (using \(\tilde{S}\) and \(N\)) or the long one (using the 4 variables).
- html_in_var_names: experimental use of
<SUB>..</SUB>
and<SUP>..</SUP>
in the label for the tree. - latex: experimental use of
_{..}
and^{..}
in the label for the tree, to use with dot2tex.
-
saveto
(filename, view=True, title='', name='', comment='', latex=False, html_in_var_names=False, ext='svg', onlyleafs=True, onlyabsorbing=False, concise=True)[source]¶
-
copy
()[source]¶ Get a new copy of that state with same S, Stilde, N, Ntilde but no probas and no children (and depth=0).
-
is_absorbing
()[source]¶ Try to detect if this state is absorbing, ie only one transition is possible, and again infinitely for the only child.
Warning
Still very experimental!
-
has_absorbing_child_whole_subtree
()[source]¶ Try to detect if this state has an absorbing child in the whole subtree.
-
explore_from_node_to_depth
(depth=1)[source]¶ Compute recursively the one_depth children of the root and its children.
-
compute_one_depth
()[source]¶ Use all_deltas to store all the possible transitions and their probabilities. Increase depth by 1 at the end.
-
all_absorbing_states
(depth=1)[source]¶ Generator that yields all the absorbing nodes of the tree, one by one.
- It might not find any,
- It does so without merging common nodes, in order to find the first absorbing node as quick as possible.
-
absorbing_states_one_depth
()[source]¶ Use all_deltas to yield all the absorbing one-depth child and their probabilities.
-
find_N_absorbing_states
(N=1, maxdepth=8)[source]¶ Find at least N absorbing states, by considering a large depth.
-
all_deltas
()[source]¶ Generator that yields functions transforming state to another state.
- It is memory efficient as it is a generator.
- Do not convert that to a list or it might use all your system memory: each returned value is a function with code and variables inside!
-
get_all_leafs
()[source]¶ Recurse and get all the leafs. Many different state can be present in the list of leafs, with possibly different probabilities (each correspond to a trajectory).
-
get_unique_leafs
()[source]¶ Compute all the leafs (deepest children) and merge the common one to compute their full probabilities.
-
proba_reaching_absorbing_state
()[source]¶ Compute the probability of reaching a leaf that is an absorbing state.
-
__dict__
= mappingproxy({'__module__': 'complete_tree_exploration_for_MP_bandits', '__doc__': 'Not space-efficient representation of a state in the system we model.\n\n - S, Stilde, N, Ntilde: are arrays of size (M, K),\n - depth, t, M, K: integers, to avoid recomputing them,\n - mus: the problem parameters (only for Bernoulli arms),\n - players: is a list of algorithms,\n - probas: list of transition probabilities,\n - children: list of all possible next states (transitions).\n ', '__init__': <function State.__init__>, '__str__': <function State.__str__>, 'to_node': <function State.to_node>, 'to_dot': <function State.to_dot>, 'saveto': <function State.saveto>, 'copy': <function State.copy>, '__hash__': <function State.__hash__>, 'is_absorbing': <function State.is_absorbing>, 'has_absorbing_child_whole_subtree': <function State.has_absorbing_child_whole_subtree>, 'explore_from_node_to_depth': <function State.explore_from_node_to_depth>, 'compute_one_depth': <function State.compute_one_depth>, 'all_absorbing_states': <function State.all_absorbing_states>, 'absorbing_states_one_depth': <function State.absorbing_states_one_depth>, 'find_N_absorbing_states': <function State.find_N_absorbing_states>, 'all_deltas': <function State.all_deltas>, 'pretty_print_result_recursively': <function State.pretty_print_result_recursively>, 'get_all_leafs': <function State.get_all_leafs>, 'get_unique_leafs': <function State.get_unique_leafs>, 'proba_reaching_absorbing_state': <function State.proba_reaching_absorbing_state>, '__dict__': <attribute '__dict__' of 'State' objects>, '__weakref__': <attribute '__weakref__' of 'State' objects>})¶
-
__module__
= 'complete_tree_exploration_for_MP_bandits'¶
-
__weakref__
¶ list of weak references to the object (if defined)
-
class
complete_tree_exploration_for_MP_bandits.
StateWithMemory
(S, Stilde, N, Ntilde, mus, players, update_memories, memories=None, depth=0)[source]¶ Bases:
complete_tree_exploration_for_MP_bandits.State
State with a memory for each player, to represent and play with RhoRand etc.
-
__init__
(S, Stilde, N, Ntilde, mus, players, update_memories, memories=None, depth=0)[source]¶ Create a new state. Arrays S, Stilde, N, Ntilde are copied to avoid modify previous values!
-
memories
= None¶ Personal memory for all players, can be a rank in {1,..,M} for rhoRand, or anything else.
-
copy
()[source]¶ Get a new copy of that state with same S, Stilde, N, Ntilde but no probas and no children (and depth=0).
-
__hash__
(full=False)[source]¶ Hash the matrix Stilde and N of the state and memories of the players (ie. ranks for RhoRand).
-
is_absorbing
()[source]¶ Try to detect if this state is absorbing, ie only one transition is possible, and again infinitely for the only child.
Warning
Still very experimental!
-
all_deltas
()[source]¶ Generator that yields functions transforming state to another state.
- It is memory efficient as it is a generator.
- Do not convert that to a list or it might use all your system memory: each returned value is a function with code and variables inside!
-
__module__
= 'complete_tree_exploration_for_MP_bandits'¶
-