Policies.Posterior.Beta module¶
Manipulate posteriors of Bernoulli/Beta experiments.
Rewards not in \({0, 1}\) are handled with a trick, see bernoulliBinarization(), with a “random binarization”, cf., [Agrawal12] (algorithm 2).
When reward \(r_t \in [0, 1]\) is observed, the player receives the result of a Bernoulli sample of average \(r_t\): \(r_t \sim \mathrm{Bernoulli}(r_t)\) so it is well in \({0, 1}\).
- See https://en.wikipedia.org/wiki/Bernoulli_distribution#Related_distributions
- And https://en.wikipedia.org/wiki/Conjugate_prior#Discrete_distributions
| [Agrawal12] | http://jmlr.org/proceedings/papers/v23/agrawal12/agrawal12.pdf |
-
Policies.Posterior.Beta.bernoulliBinarization(r_t)[source]¶ Return a (random) binarization of a reward \(r_t\), in the continuous interval \([0, 1]\) as an observation in discrete \({0, 1}\).
- Useful to allow to use a Beta posterior for non-Bernoulli experiments,
- That way,
Thompsonsampling can be used for any continuous-valued bounded rewards.
Examples:
>>> import random >>> random.seed(0)
>>> bernoulliBinarization(0.3) 1 >>> bernoulliBinarization(0.3) 0 >>> bernoulliBinarization(0.3) 0 >>> bernoulliBinarization(0.3) 0
>>> bernoulliBinarization(0.9) 1 >>> bernoulliBinarization(0.9) 1 >>> bernoulliBinarization(0.9) 1 >>> bernoulliBinarization(0.9) 0
-
class
Policies.Posterior.Beta.Beta(a=1, b=1)[source]¶ Bases:
Policies.Posterior.Posterior.PosteriorManipulate posteriors of Bernoulli/Beta experiments.
-
__init__(a=1, b=1)[source]¶ Create a Beta posterior \(\mathrm{Beta}(\alpha, \beta)\) with no observation, i.e., \(\alpha = 1\) and \(\beta = 1\) by default.
-
N= None¶ List of two parameters [a, b]
-
sample()[source]¶ Get a random sample from the Beta posterior (using
numpy.random.betavariate()).- Used only by
ThompsonSampling andAdBanditsso far.
- Used only by
-
quantile(p)[source]¶ Return the p quantile of the Beta posterior (using
scipy.stats.btdtri()).- Used only by
BayesUCBandAdBanditsso far.
- Used only by
-
update(obs)[source]¶ Add an observation.
- If obs is 1, update \(\alpha\) the count of positive observations,
- If it is 0, update \(\beta\) the count of negative observations.
Note
Otherwise, a trick with
bernoulliBinarization()has to be used.
-
__module__= 'Policies.Posterior.Beta'¶
-
-
Policies.Posterior.Beta.betavariate()¶ beta(a, b, size=None)
Draw samples from a Beta distribution.
The Beta distribution is a special case of the Dirichlet distribution, and is related to the Gamma distribution. It has the probability distribution function
\[f(x; a,b) = \frac{1}{B(\alpha, \beta)} x^{\alpha - 1} (1 - x)^{\beta - 1},\]where the normalization, B, is the beta function,
\[B(\alpha, \beta) = \int_0^1 t^{\alpha - 1} (1 - t)^{\beta - 1} dt.\]It is often seen in Bayesian inference and order statistics.
Note
New code should use the
betamethod of adefault_rng()instance instead; please see the Quick Start.- a : float or array_like of floats
- Alpha, positive (>0).
- b : float or array_like of floats
- Beta, positive (>0).
- size : int or tuple of ints, optional
- Output shape. If the given shape is, e.g.,
(m, n, k), thenm * n * ksamples are drawn. If size isNone(default), a single value is returned ifaandbare both scalars. Otherwise,np.broadcast(a, b).sizesamples are drawn.
- out : ndarray or scalar
- Drawn samples from the parameterized beta distribution.
Generator.beta: which should be used for new code.
-
Policies.Posterior.Beta.random() → x in the interval [0, 1).¶