Policies.Posterior.DiscountedBeta module¶

Manipulate posteriors of Bernoulli/Beta experiments., for discounted Bayesian policies (Policies.DiscountedBayesianIndexPolicy).

Policies.Posterior.DiscountedBeta.GAMMA = 0.95¶: Default value for the discount factor \(\gamma\in(0,1)\). 0.95 is empirically a reasonable value for short-term non-stationary experiments.

class Policies.Posterior.DiscountedBeta.DiscountedBeta(gamma=0.95, a=1, b=1)[source]¶

Bases: Policies.Posterior.Beta.Beta

Manipulate posteriors of Bernoulli/Beta experiments, for discounted Bayesian policies (Policies.DiscountedBayesianIndexPolicy).

It keeps \(\tilde{S}(t)\) and \(\tilde{F}(t)\) the discounted counts of successes and failures (S and F).

__init__(gamma=0.95, a=1, b=1)[source]¶: Create a Beta posterior \(\mathrm{Beta}(\alpha, \beta)\) with no observation, i.e., \(\alpha = 1\) and \(\beta = 1\) by default.

N = None¶: List of two parameters [a, b]

gamma = None¶: Discount factor \(\gamma\in(0,1)\).

__str__()[source]¶: Return str(self).

reset(a=None, b=None)[source]¶: Reset alpha and beta, both to 0 as when creating a new default DiscountedBeta.

sample()[source]¶

Get a random sample from the DiscountedBeta posterior (using numpy.random.betavariate()).

Used only by Thompson Sampling and AdBandits so far.

quantile(p)[source]¶

Return the p quantile of the DiscountedBeta posterior (using scipy.stats.btdtri()).

Used only by BayesUCB and AdBandits so far.

forget(obs)[source]¶: Forget the last observation, and undiscount the count of observations.

update(obs)[source]¶

Add an observation, and discount the previous observations.

If obs is 1, update \(\alpha\) the count of positive observations,
If it is 0, update \(\beta\) the count of negative observations.
But instead of using \(\tilde{S}(t) = S(t)\) and \(\tilde{N}(t) = N(t)\), they are updated at each time step using the discount factor \(\gamma\):

\[\tilde{S}(t+1) &= \gamma \tilde{S}(t) + r(t), \tilde{F}(t+1) &= \gamma \tilde{F}(t) + (1 - r(t)).\]

Note

Otherwise, a trick with bernoulliBinarization() has to be used.

discount()[source]¶: Simply discount the old observation, when no observation is given at this time.

\[\tilde{S}(t+1) &= \gamma \tilde{S}(t), \tilde{F}(t+1) &= \gamma \tilde{F}(t).\]

undiscount()[source]¶: Simply cancel the discount on the old observation, when no observation is given at this time.

\[\tilde{S}(t+1) &= \frac{1}{\gamma} \tilde{S}(t), \tilde{F}(t+1) &= \frac{1}{\gamma} \tilde{F}(t).\]

__module__ = 'Policies.Posterior.DiscountedBeta'¶

Policies.Posterior.DiscountedBeta.betavariate()¶

beta(a, b, size=None)

Draw samples from a Beta distribution.

The Beta distribution is a special case of the Dirichlet distribution, and is related to the Gamma distribution. It has the probability distribution function

\[f(x; a,b) = \frac{1}{B(\alpha, \beta)} x^{\alpha - 1} (1 - x)^{\beta - 1},\]

where the normalization, B, is the beta function,

\[B(\alpha, \beta) = \int_0^1 t^{\alpha - 1} (1 - t)^{\beta - 1} dt.\]

It is often seen in Bayesian inference and order statistics.

Note

New code should use the beta method of a default_rng() instance instead; please see the Quick Start.

a : float or array_like of floats: Alpha, positive (>0).
b : float or array_like of floats: Beta, positive (>0).
size : int or tuple of ints, optional: Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if a and b are both scalars. Otherwise, np.broadcast(a, b).size samples are drawn.

out : ndarray or scalar: Drawn samples from the parameterized beta distribution.

Generator.beta: which should be used for new code.