Policies.Posterior.DiscountedBeta module¶
Manipulate posteriors of Bernoulli/Beta experiments., for discounted Bayesian policies (Policies.DiscountedBayesianIndexPolicy
).
-
Policies.Posterior.DiscountedBeta.
GAMMA
= 0.95¶ Default value for the discount factor \(\gamma\in(0,1)\).
0.95
is empirically a reasonable value for short-term non-stationary experiments.
-
class
Policies.Posterior.DiscountedBeta.
DiscountedBeta
(gamma=0.95, a=1, b=1)[source]¶ Bases:
Policies.Posterior.Beta.Beta
Manipulate posteriors of Bernoulli/Beta experiments, for discounted Bayesian policies (
Policies.DiscountedBayesianIndexPolicy
).- It keeps \(\tilde{S}(t)\) and \(\tilde{F}(t)\) the discounted counts of successes and failures (S and F).
-
__init__
(gamma=0.95, a=1, b=1)[source]¶ Create a Beta posterior \(\mathrm{Beta}(\alpha, \beta)\) with no observation, i.e., \(\alpha = 1\) and \(\beta = 1\) by default.
-
N
= None¶ List of two parameters [a, b]
-
gamma
= None¶ Discount factor \(\gamma\in(0,1)\).
-
reset
(a=None, b=None)[source]¶ Reset alpha and beta, both to 0 as when creating a new default DiscountedBeta.
-
sample
()[source]¶ Get a random sample from the DiscountedBeta posterior (using
numpy.random.betavariate()
).- Used only by
Thompson
Sampling andAdBandits
so far.
- Used only by
-
quantile
(p)[source]¶ Return the p quantile of the DiscountedBeta posterior (using
scipy.stats.btdtri()
).- Used only by
BayesUCB
andAdBandits
so far.
- Used only by
-
update
(obs)[source]¶ Add an observation, and discount the previous observations.
- If obs is 1, update \(\alpha\) the count of positive observations,
- If it is 0, update \(\beta\) the count of negative observations.
- But instead of using \(\tilde{S}(t) = S(t)\) and \(\tilde{N}(t) = N(t)\), they are updated at each time step using the discount factor \(\gamma\):
\[\tilde{S}(t+1) &= \gamma \tilde{S}(t) + r(t), \tilde{F}(t+1) &= \gamma \tilde{F}(t) + (1 - r(t)).\]Note
Otherwise, a trick with
bernoulliBinarization()
has to be used.
-
discount
()[source]¶ Simply discount the old observation, when no observation is given at this time.
\[\tilde{S}(t+1) &= \gamma \tilde{S}(t), \tilde{F}(t+1) &= \gamma \tilde{F}(t).\]
-
undiscount
()[source]¶ Simply cancel the discount on the old observation, when no observation is given at this time.
\[\tilde{S}(t+1) &= \frac{1}{\gamma} \tilde{S}(t), \tilde{F}(t+1) &= \frac{1}{\gamma} \tilde{F}(t).\]
-
__module__
= 'Policies.Posterior.DiscountedBeta'¶
-
Policies.Posterior.DiscountedBeta.
betavariate
()¶ beta(a, b, size=None)
Draw samples from a Beta distribution.
The Beta distribution is a special case of the Dirichlet distribution, and is related to the Gamma distribution. It has the probability distribution function
\[f(x; a,b) = \frac{1}{B(\alpha, \beta)} x^{\alpha - 1} (1 - x)^{\beta - 1},\]where the normalization, B, is the beta function,
\[B(\alpha, \beta) = \int_0^1 t^{\alpha - 1} (1 - t)^{\beta - 1} dt.\]It is often seen in Bayesian inference and order statistics.
Note
New code should use the
beta
method of adefault_rng()
instance instead; please see the Quick Start.- a : float or array_like of floats
- Alpha, positive (>0).
- b : float or array_like of floats
- Beta, positive (>0).
- size : int or tuple of ints, optional
- Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. If size isNone
(default), a single value is returned ifa
andb
are both scalars. Otherwise,np.broadcast(a, b).size
samples are drawn.
- out : ndarray or scalar
- Drawn samples from the parameterized beta distribution.
Generator.beta: which should be used for new code.