Policies.RAWUCB module

author: Julien Seznec

Rotting Adaptive Window Upper Confidence Bounds for rotting bandits.

Reference : [Seznec et al., 2019b] A single algorithm for both rested and restless rotting bandits (WIP) Julien Seznec, Pierre Ménard, Alessandro Lazaric, Michal Valko

class Policies.RAWUCB.EFF_RAWUCB(nbArms, alpha=0.06, subgaussian=1, m=None, delta=None, delay=False)[source]

Bases: Policies.FEWA.EFF_FEWA

Efficient Rotting Adaptive Window Upper Confidence Bound (RAW-UCB) [Seznec et al., 2020] Efficient trick described in [Seznec et al., 2019a, https://arxiv.org/abs/1811.11043] (m=2) and [Seznec et al., 2020] (m<=2) We use the confidence level :math:`delta_t =

rac{1}{t^lpha}`.

choice()[source]

Not defined.

_compute_ucb()[source]
_append_thresholds(w)[source]
__str__()[source]

-> str

__module__ = 'Policies.RAWUCB'
class Policies.RAWUCB.EFF_RAWklUCB(nbArms, subgaussian=1, alpha=1, klucb=<function klucbBern>, tol=0.0001, m=2)[source]

Bases: Policies.RAWUCB.EFF_RAWUCB

Use KL-confidence bound instead of close formula approximation. Experimental work : Much slower (!!) because we compute many UCB at each round per arm)

__init__(nbArms, subgaussian=1, alpha=1, klucb=<function klucbBern>, tol=0.0001, m=2)[source]

New policy.

choice()[source]

Not defined.

__str__()[source]

-> str

__module__ = 'Policies.RAWUCB'
class Policies.RAWUCB.RAWUCB(nbArms, subgaussian=1, alpha=1)[source]

Bases: Policies.RAWUCB.EFF_RAWUCB

Rotting Adaptive Window Upper Confidence Bound (RAW-UCB) [Seznec et al., 2020] We use the confidence level :math:`delta_t =

rac{1}{t^lpha}`.

__init__(nbArms, subgaussian=1, alpha=1)[source]

New policy.

__str__()[source]

-> str

__module__ = 'Policies.RAWUCB'
class Policies.RAWUCB.EFF_RAWUCB_pp(nbArms, subgaussian=1, alpha=1, beta=0, m=2)[source]

Bases: Policies.RAWUCB.EFF_RAWUCB

Efficient Rotting Adaptive Window Upper Confidence Bound ++ (RAW-UCB++) [Seznec et al., 2020, Thesis] We use the confidence level :math:`delta_{t,h} =

rac{Kh}{t(1+log(t/Kh)^Beta)}`.

__init__(nbArms, subgaussian=1, alpha=1, beta=0, m=2)[source]

New policy.

__str__()[source]

-> str

_compute_ucb()[source]
_inlog(w)[source]
__module__ = 'Policies.RAWUCB'
class Policies.RAWUCB.RAWUCB_pp(nbArms, subgaussian=1, beta=2)[source]

Bases: Policies.RAWUCB.EFF_RAWUCB_pp

Rotting Adaptive Window Upper Confidence Bound (RAW-UCB) [Seznec et al., 2019b, WIP] We use the confidence level :math:`delta_t =

rac{Kh}{t^lpha}`.

__init__(nbArms, subgaussian=1, beta=2)[source]

New policy.

__str__()[source]

-> str

__module__ = 'Policies.RAWUCB'