Policies.SWA module

author : Julien Seznec Sliding Window Average policy for rotting bandits.

Reference: [Levine et al., 2017, https://papers.nips.cc/paper/6900-rotting-bandits.pdf]. Advances in Neural Information Processing Systems 30 (NIPS 2017) Nir Levine, Koby Crammer, Shie Mannor

class Policies.SWA.SWA(nbArms, horizon=1, subgaussian=1, maxDecrement=1, alpha=0.2, doublingTrick=False)[source]

Bases: Policies.IndexPolicy.IndexPolicy

The Sliding Window Average policy for rotting bandits. Reference: [Levine et al., 2017, https://papers.nips.cc/paper/6900-rotting-bandits.pdf].

__init__(nbArms, horizon=1, subgaussian=1, maxDecrement=1, alpha=0.2, doublingTrick=False)[source]

New generic index policy.

  • nbArms: the number of arms,
  • lower, amplitude: lower value and known amplitude of the rewards.
setWindow()[source]
getReward(arm, reward)[source]

Give a reward: increase t, pulls, and update cumulated sum of rewards for that arm (normalized in [0, 1]).

computeIndex(arm)[source]

Compute the mean of the h last value

startGame(resetHorizon=True)[source]

Initialize the policy for a new game.

__module__ = 'Policies.SWA'
class Policies.SWA.wSWA(nbArms, firstHorizon=1, subgaussian=1, maxDecrement=1, alpha=0.2)[source]

Bases: Policies.SWA.SWA

SWA with doubling trick Reference: [Levine et al., 2017, https://papers.nips.cc/paper/6900-rotting-bandits.pdf].

__init__(nbArms, firstHorizon=1, subgaussian=1, maxDecrement=1, alpha=0.2)[source]

New generic index policy.

  • nbArms: the number of arms,
  • lower, amplitude: lower value and known amplitude of the rewards.
__str__()[source]

-> str

doublingTrick()[source]
getReward(arm, reward)[source]

Give a reward: increase t, pulls, and update cumulated sum of rewards for that arm (normalized in [0, 1]).

__module__ = 'Policies.SWA'