Policies.SWA module¶

author : Julien Seznec Sliding Window Average policy for rotting bandits.

Reference: [Levine et al., 2017, https://papers.nips.cc/paper/6900-rotting-bandits.pdf]. Advances in Neural Information Processing Systems 30 (NIPS 2017) Nir Levine, Koby Crammer, Shie Mannor

class Policies.SWA.SWA(nbArms, horizon=1, subgaussian=1, maxDecrement=1, alpha=0.2, doublingTrick=False)[source]¶

The Sliding Window Average policy for rotting bandits. Reference: [Levine et al., 2017, https://papers.nips.cc/paper/6900-rotting-bandits.pdf].

__init__(nbArms, horizon=1, subgaussian=1, maxDecrement=1, alpha=0.2, doublingTrick=False)[source]¶

New generic index policy.

getReward(arm, reward)[source]¶: Give a reward: increase t, pulls, and update cumulated sum of rewards for that arm (normalized in [0, 1]).

startGame(resetHorizon=True)[source]¶: Initialize the policy for a new game.

class Policies.SWA.wSWA(nbArms, firstHorizon=1, subgaussian=1, maxDecrement=1, alpha=0.2)[source]¶

SWA with doubling trick Reference: [Levine et al., 2017, https://papers.nips.cc/paper/6900-rotting-bandits.pdf].

__init__(nbArms, firstHorizon=1, subgaussian=1, maxDecrement=1, alpha=0.2)[source]¶

New generic index policy.

getReward(arm, reward)[source]¶: Give a reward: increase t, pulls, and update cumulated sum of rewards for that arm (normalized in [0, 1]).