Policies.Posterior.Beta module¶

Manipulate posteriors of Bernoulli/Beta experiments.

Rewards not in \({0, 1}\) are handled with a trick, see bernoulliBinarization(), with a “random binarization”, cf., [Agrawal12] (algorithm 2). When reward \(r_t \in [0, 1]\) is observed, the player receives the result of a Bernoulli sample of average \(r_t\): \(r_t \sim \mathrm{Bernoulli}(r_t)\) so it is well in \({0, 1}\).

[Agrawal12]

http://jmlr.org/proceedings/papers/v23/agrawal12/agrawal12.pdf

Policies.Posterior.Beta.bernoulliBinarization(r_t)[source]¶

Return a (random) binarization of a reward \(r_t\), in the continuous interval \([0, 1]\) as an observation in discrete \({0, 1}\).

Useful to allow to use a Beta posterior for non-Bernoulli experiments,
That way, Thompson sampling can be used for any continuous-valued bounded rewards.

Examples:

>>> import random
>>> random.seed(0)

>>> bernoulliBinarization(0.3)
1
>>> bernoulliBinarization(0.3)
0
>>> bernoulliBinarization(0.3)
0
>>> bernoulliBinarization(0.3)
0

>>> bernoulliBinarization(0.9)
1
>>> bernoulliBinarization(0.9)
1
>>> bernoulliBinarization(0.9)
1
>>> bernoulliBinarization(0.9)
0

class Policies.Posterior.Beta.Beta(a=1, b=1)[source]¶

Bases: Policies.Posterior.Posterior.Posterior

Manipulate posteriors of Bernoulli/Beta experiments.

__init__(a=1, b=1)[source]¶: Create a Beta posterior \(\mathrm{Beta}(\alpha, \beta)\) with no observation, i.e., \(\alpha = 1\) and \(\beta = 1\) by default.

N = None¶: List of two parameters [a, b]

__str__()[source]¶: Return str(self).

reset(a=None, b=None)[source]¶: Reset alpha and beta, both to 1 as when creating a new default Beta.

sample()[source]¶

Get a random sample from the Beta posterior (using numpy.random.betavariate()).

Used only by Thompson Sampling and AdBandits so far.

quantile(p)[source]¶

Return the p quantile of the Beta posterior (using scipy.stats.btdtri()).

Used only by BayesUCB and AdBandits so far.

mean()[source]¶: Compute the mean of the Beta posterior (should be useless).

forget(obs)[source]¶: Forget the last observation.

update(obs)[source]¶

Add an observation.

If obs is 1, update \(\alpha\) the count of positive observations,
If it is 0, update \(\beta\) the count of negative observations.

Note

Otherwise, a trick with bernoulliBinarization() has to be used.

__module__ = 'Policies.Posterior.Beta'¶

Policies.Posterior.Beta.betavariate()¶

beta(a, b, size=None)

Draw samples from a Beta distribution.

The Beta distribution is a special case of the Dirichlet distribution, and is related to the Gamma distribution. It has the probability distribution function

\[f(x; a,b) = \frac{1}{B(\alpha, \beta)} x^{\alpha - 1} (1 - x)^{\beta - 1},\]

where the normalization, B, is the beta function,

\[B(\alpha, \beta) = \int_0^1 t^{\alpha - 1} (1 - t)^{\beta - 1} dt.\]

It is often seen in Bayesian inference and order statistics.

Note

New code should use the beta method of a default_rng() instance instead; please see the Quick Start.

a : float or array_like of floats: Alpha, positive (>0).
b : float or array_like of floats: Beta, positive (>0).
size : int or tuple of ints, optional: Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if a and b are both scalars. Otherwise, np.broadcast(a, b).size samples are drawn.

out : ndarray or scalar: Drawn samples from the parameterized beta distribution.

Generator.beta: which should be used for new code.

Policies.Posterior.Beta.random() → x in the interval [0, 1).¶