Policies.RAWUCB module¶
author: Julien Seznec
Rotting Adaptive Window Upper Confidence Bounds for rotting bandits.
Reference : [Seznec et al., 2019b] A single algorithm for both rested and restless rotting bandits (WIP) Julien Seznec, Pierre Ménard, Alessandro Lazaric, Michal Valko
-
class
Policies.RAWUCB.EFF_RAWUCB(nbArms, alpha=0.06, subgaussian=1, m=None, delta=None, delay=False)[source]¶ Bases:
Policies.FEWA.EFF_FEWAEfficient Rotting Adaptive Window Upper Confidence Bound (RAW-UCB) [Seznec et al., 2020] Efficient trick described in [Seznec et al., 2019a, https://arxiv.org/abs/1811.11043] (m=2) and [Seznec et al., 2020] (m<=2) We use the confidence level :math:`delta_t =rac{1}{t^lpha}`.
-
__module__= 'Policies.RAWUCB'¶
-
-
class
Policies.RAWUCB.EFF_RAWklUCB(nbArms, subgaussian=1, alpha=1, klucb=<function klucbBern>, tol=0.0001, m=2)[source]¶ Bases:
Policies.RAWUCB.EFF_RAWUCBUse KL-confidence bound instead of close formula approximation. Experimental work : Much slower (!!) because we compute many UCB at each round per arm)
-
__init__(nbArms, subgaussian=1, alpha=1, klucb=<function klucbBern>, tol=0.0001, m=2)[source]¶ New policy.
-
__module__= 'Policies.RAWUCB'¶
-
-
class
Policies.RAWUCB.RAWUCB(nbArms, subgaussian=1, alpha=1)[source]¶ Bases:
Policies.RAWUCB.EFF_RAWUCBRotting Adaptive Window Upper Confidence Bound (RAW-UCB) [Seznec et al., 2020] We use the confidence level :math:`delta_t =rac{1}{t^lpha}`.
-
__module__= 'Policies.RAWUCB'¶
-
-
class
Policies.RAWUCB.EFF_RAWUCB_pp(nbArms, subgaussian=1, alpha=1, beta=0, m=2)[source]¶ Bases:
Policies.RAWUCB.EFF_RAWUCBEfficient Rotting Adaptive Window Upper Confidence Bound ++ (RAW-UCB++) [Seznec et al., 2020, Thesis] We use the confidence level :math:`delta_{t,h} =rac{Kh}{t(1+log(t/Kh)^Beta)}`.
-
__module__= 'Policies.RAWUCB'¶
-
-
class
Policies.RAWUCB.RAWUCB_pp(nbArms, subgaussian=1, beta=2)[source]¶ Bases:
Policies.RAWUCB.EFF_RAWUCB_ppRotting Adaptive Window Upper Confidence Bound (RAW-UCB) [Seznec et al., 2019b, WIP] We use the confidence level :math:`delta_t =rac{Kh}{t^lpha}`.
-
__module__= 'Policies.RAWUCB'¶
-