Policies.BESA module¶

The Best Empirical Sampled Average (BESA) algorithm.

Reference: [[Sub-Sampling For Multi Armed Bandits, Baransi et al., 2014]](https://hal.archives-ouvertes.fr/hal-01025651)
See also: https://github.com/SMPyBandits/SMPyBandits/issues/103 and https://github.com/SMPyBandits/SMPyBandits/issues/116

Warning

This algorithm works VERY well but it is looks weird at first sight. It sounds “too easy”, so take a look to the article before wondering why it should work.

Warning

Right now, it is between 10 and 25 times slower than Policies.klUCB and other single-player policies.

Policies.BESA.subsample_deterministic(n, m)[source]¶

Returns \(\{1,\dots,n\}\) if \(n < m\) or \(\{1,\dots,m\}\) if \(n \geq m\) (ie, it is \(\{1,\dots,\min(n,m)\}\)).

Warning

The BESA algorithm is efficient only with the random sub-sampling, don’t use this one except for comparing.

>>> subsample_deterministic(5, 3)  # doctest: +ELLIPSIS
array([0, 1, 2, 3])
>>> subsample_deterministic(10, 20)  # doctest: +ELLIPSIS
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

Policies.BESA.subsample_uniform(n, m)[source]¶

Returns a uniform sub-set of size \(n\), from \(\{1,dots, m\}\).

Fails if n > m.

Note

The BESA algorithm is efficient only with the random sub-sampling.

>>> np.random.seed(1234)  # reproducible results
>>> subsample_uniform(3, 5)  # doctest: +ELLIPSIS
array([4, 0, 1])
>>> subsample_uniform(10, 20)  # doctest: +ELLIPSIS
array([ 7, 16,  2,  3,  1, 18,  5,  4,  0,  8])

Policies.BESA.TOLERANCE = 1e-06¶: Numerical tolerance when comparing two means. Should not be zero!

Policies.BESA.inverse_permutation(permutation, j)[source]¶

Inverse the permutation for given input j, that is, it finds i such that p[i] = j.

>>> permutation = [1, 0, 3, 2]
>>> inverse_permutation(permutation, 1)
0
>>> inverse_permutation(permutation, 0)
1

Policies.BESA.besa_two_actions(rewards, pulls, a, b, subsample_function=<function subsample_uniform>)[source]¶

Core algorithm for the BESA selection, for two actions a and b:

N = min(Na, Nb),
Sub-sample N values from rewards of arm a, and N values from rewards of arm b,
Compute mean of both samples of size N, call them m_a, m_b,
If m_a > m_b, choose a,
Else if m_a < m_b, choose b,
And in case of a tie, break by choosing i such that Ni is minimal (or random [a, b] if Na=Nb).

Note

rewards can be a numpy array of shape (at least) (nbArms, max(Na, Nb)) or a dictionary maping a,b to lists (or iterators) of lengths >= max(Na, Nb).

>>> np.random.seed(2345)  # reproducible results
>>> pulls = [6, 10]; K = len(pulls); N = max(pulls)
>>> rewards = np.random.randn(K, N)
>>> np.mean(rewards, axis=1)  # arm 1 is better  # doctest: +ELLIPSIS
array([0.154..., 0.158...])
>>> np.mean(rewards[:, :min(pulls)], axis=1)  # arm 0 is better in the first 6 samples  # doctest: +ELLIPSIS
array([0.341..., 0.019...])
>>> besa_two_actions(rewards, pulls, 0, 1, subsample_function=subsample_deterministic)  # doctest: +ELLIPSIS
0
>>> [besa_two_actions(rewards, pulls, 0, 1, subsample_function=subsample_uniform) for _ in range(10)]  # doctest: +ELLIPSIS
[0, 0, 1, 1, 0, 0, 1, 0, 0, 0]

Policies.BESA.besa_K_actions__non_randomized(rewards, pulls, left, right, subsample_function=<function subsample_uniform>, depth=0)[source]¶

BESA recursive selection algorithm for an action set of size \(\mathcal{K} \geq 1\).

I prefer to implement for a discrete action set \(\{\text{left}, \dots, \text{right}\}\) (end included) instead of a generic actions vector, to speed up the code, but it is less readable.
The depth argument is just for pretty printing debugging information (useless).

Warning

The binary tournament is NOT RANDOMIZED here, this version is only for testing.

>>> np.random.seed(1234)  # reproducible results
>>> pulls = [5, 6, 7, 8]; K = len(pulls); N = max(pulls)
>>> rewards = np.random.randn(K, N)
>>> np.mean(rewards, axis=1)  # arm 0 is better
array([ 0.09876921, -0.18561207,  0.04463033,  0.0653539 ])
>>> np.mean(rewards[:, :min(pulls)], axis=1)  # arm 1 is better in the first 6 samples
array([-0.06401484,  0.17366346,  0.05323033, -0.09514708])
>>> besa_K_actions__non_randomized(rewards, pulls, 0, K-1, subsample_function=subsample_deterministic)  # doctest: +ELLIPSIS
3
>>> [besa_K_actions__non_randomized(rewards, pulls, 0, K-1, subsample_function=subsample_uniform) for _ in range(10)]  # doctest: +ELLIPSIS
[3, 3, 2, 3, 3, 0, 0, 0, 2, 3]

Policies.BESA.besa_K_actions__smart_divideandconquer(rewards, pulls, left, right, random_permutation_of_arm=None, subsample_function=<function subsample_uniform>, depth=0)[source]¶

BESA recursive selection algorithm for an action set of size \(\mathcal{K} \geq 1\).

I prefer to implement for a discrete action set \(\{\text{left}, \dots, \text{right}\}\) (end included) instead of a generic actions vector, to speed up the code, but it is less readable.
The depth argument is just for pretty printing debugging information (useless).

Note

The binary tournament is RANDOMIZED here, as it should be.

>>> np.random.seed(1234)  # reproducible results
>>> pulls = [5, 6, 7, 8]; K = len(pulls); N = max(pulls)
>>> rewards = np.random.randn(K, N)
>>> np.mean(rewards, axis=1)  # arm 0 is better
array([ 0.09876921, -0.18561207,  0.04463033,  0.0653539 ])
>>> np.mean(rewards[:, :min(pulls)], axis=1)  # arm 1 is better in the first 6 samples
array([-0.06401484,  0.17366346,  0.05323033, -0.09514708])
>>> besa_K_actions__smart_divideandconquer(rewards, pulls, 0, K-1, subsample_function=subsample_deterministic)  # doctest: +ELLIPSIS
3
>>> [besa_K_actions__smart_divideandconquer(rewards, pulls, 0, K-1, subsample_function=subsample_uniform) for _ in range(10)]  # doctest: +ELLIPSIS
[3, 3, 2, 3, 3, 0, 0, 0, 2, 3]

Policies.BESA.besa_K_actions(rewards, pulls, actions, subsample_function=<function subsample_uniform>, depth=0)[source]¶

BESA recursive selection algorithm for an action set of size \(\mathcal{K} \geq 1\).

The divide and conquer is implemented for a generic list of actions, it’s slower but simpler to write! Left and right divisions are just actions[:len(actions)//2] and actions[len(actions)//2:].
Actions is assumed to be shuffled before calling this function!
The depth argument is just for pretty printing debugging information (useless).

Note

The binary tournament is RANDOMIZED here, as it should be.

>>> np.random.seed(1234)  # reproducible results
>>> pulls = [5, 6, 7, 8]; K = len(pulls); N = max(pulls)
>>> actions = np.arange(K)
>>> rewards = np.random.randn(K, N)
>>> np.mean(rewards, axis=1)  # arm 0 is better
array([ 0.09876921, -0.18561207,  0.04463033,  0.0653539 ])
>>> np.mean(rewards[:, :min(pulls)], axis=1)  # arm 1 is better in the first 6 samples
array([-0.06401484,  0.17366346,  0.05323033, -0.09514708])
>>> besa_K_actions(rewards, pulls, actions, subsample_function=subsample_deterministic)  # doctest: +ELLIPSIS
3
>>> [besa_K_actions(rewards, pulls, actions, subsample_function=subsample_uniform) for _ in range(10)]  # doctest: +ELLIPSIS
[3, 3, 2, 3, 3, 0, 0, 0, 2, 3]

Policies.BESA.besa_K_actions__non_binary(rewards, pulls, actions, subsample_function=<function subsample_uniform>, depth=0)[source]¶

BESA recursive selection algorithm for an action set of size \(\mathcal{K} \geq 1\).

Instead of doing this binary tree tournaments (which results in \(\mathcal{O}(K^2)\) calls to the 2-arm procedure), we can do a line tournaments: 1 vs 2, winner vs 3, winner vs 4 etc, winner vs K-1 (which results in \(\mathcal{O}(K)\) calls),
Actions is assumed to be shuffled before calling this function!
The depth argument is just for pretty printing debugging information (useless).

>>> np.random.seed(1234)  # reproducible results
>>> pulls = [5, 6, 7, 8]; K = len(pulls); N = max(pulls)
>>> actions = np.arange(K)
>>> rewards = np.random.randn(K, N)
>>> np.mean(rewards, axis=1)  # arm 0 is better
array([ 0.09876921, -0.18561207,  0.04463033,  0.0653539 ])
>>> np.mean(rewards[:, :min(pulls)], axis=1)  # arm 1 is better in the first 6 samples
array([-0.06401484,  0.17366346,  0.05323033, -0.09514708])
>>> besa_K_actions__non_binary(rewards, pulls, actions, subsample_function=subsample_deterministic)  # doctest: +ELLIPSIS
3
>>> [besa_K_actions__non_binary(rewards, pulls, actions, subsample_function=subsample_uniform) for _ in range(10)]  # doctest: +ELLIPSIS
[3, 3, 3, 2, 0, 3, 3, 3, 3, 3]

Policies.BESA.besa_K_actions__non_recursive(rewards, pulls, actions, subsample_function=<function subsample_uniform>, depth=0)[source]¶

BESA non-recursive selection algorithm for an action set of size \(\mathcal{K} \geq 1\).

No calls to besa_two_actions(), just generalize it to K actions instead of 2.
Actions is assumed to be shuffled before calling this function!

>>> np.random.seed(1234)  # reproducible results
>>> pulls = [5, 6, 7, 8]; K = len(pulls); N = max(pulls)
>>> rewards = np.random.randn(K, N)
>>> np.mean(rewards, axis=1)  # arm 0 is better
array([ 0.09876921, -0.18561207,  0.04463033,  0.0653539 ])
>>> np.mean(rewards[:, :min(pulls)], axis=1)  # arm 1 is better in the first 6 samples
array([-0.06401484,  0.17366346,  0.05323033, -0.09514708])
>>> besa_K_actions__non_recursive(rewards, pulls, None, subsample_function=subsample_deterministic)  # doctest: +ELLIPSIS
3
>>> [besa_K_actions__non_recursive(rewards, pulls, None, subsample_function=subsample_uniform) for _ in range(10)]  # doctest: +ELLIPSIS
[1, 3, 0, 2, 2, 3, 1, 1, 3, 1]

class Policies.BESA.BESA(nbArms, horizon=None, minPullsOfEachArm=1, randomized_tournament=True, random_subsample=True, non_binary=False, non_recursive=False, lower=0.0, amplitude=1.0)[source]¶

Bases: Policies.IndexPolicy.IndexPolicy

The Best Empirical Sampled Average (BESA) algorithm.

Reference: [[Sub-Sampling For Multi Armed Bandits, Baransi et al., 2014]](https://hal.inria.fr/hal-01025651)

Warning

The BESA algorithm requires to store all the history of rewards, so its memory usage for \(T\) rounds with \(K\) arms is \(\mathcal{O}(K T)\), which is huge for large \(T\), be careful! Aggregating different BESA instances is probably a bad idea because of this limitation!

__init__(nbArms, horizon=None, minPullsOfEachArm=1, randomized_tournament=True, random_subsample=True, non_binary=False, non_recursive=False, lower=0.0, amplitude=1.0)[source]¶

New generic index policy.

nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.

horizon = None¶: Just to know the memory to allocate for rewards. It could be implemented without knowing the horizon, by using lists to keep all the reward history, but this would be way slower!

minPullsOfEachArm = None¶: Minimum number of pulls of each arm before using the BESA algorithm. Using 1 might not be the best choice

randomized_tournament = None¶: Whether to use a deterministic or random tournament.

random_subsample = None¶: Whether to use a deterministic or random sub-sampling procedure.

non_binary = None¶: Whether to use besa_K_actions() or besa_K_actions__non_binary() for the selection of K arms.

non_recursive = None¶: Whether to use besa_K_actions() or besa_K_actions__non_recursive() for the selection of K arms.

all_rewards = None¶: Keep all rewards of each arms. It consumes a \(\mathcal{O}(K T)\) memory, that’s really bad!!

__str__()[source]¶: -> str

getReward(arm, reward)[source]¶: Add the current reward in the global history.

Note

There is no need to normalize the reward in [0,1], that’s one of the strong point of the BESA algorithm.

choice()[source]¶: Applies the BESA procedure with the current data history.

choiceFromSubSet(availableArms='all')[source]¶: Applies the BESA procedure with the current data history, to the restricted set of arm.

choiceMultiple(nb=1)[source]¶

Applies the multiple-choice BESA procedure with the current data history:

select a first arm with basic BESA procedure with full action set,
remove it from the set of actions,
restart step 1 with new smaller set of actions, until nb arm where chosen by basic BESA.

Note

This was not studied or published before, and there is no theoretical results about it!

Warning

This is very inefficient! The BESA procedure is already quite slow (with my current naive implementation), this is crazily slow!

choiceWithRank(rank=1)[source]¶

Applies the ranked BESA procedure with the current data history:

use choiceMultiplie() to select rank actions,
then take the rank-th chosen action (the last one).

Note

This was not studied or published before, and there is no theoretical results about it!

Warning

This is very inefficient! The BESA procedure is already quite slow (with my current naive implementation), this is crazily slow!

__module__ = 'Policies.BESA'¶

computeIndex(arm)[source]¶: Compute the current index of arm ‘arm’.

Warning

This index is not the one used for the choice of arm (which use sub sampling). It’s just the empirical mean of the arm.

computeAllIndex()[source]¶: Compute the current index of arm ‘arm’ (vectorized).

Warning

This index is not the one used for the choice of arm (which use sub sampling). It’s just the empirical mean of the arm.

handleCollision(arm, reward=None)[source]¶: Nothing special to do.