Policies.GreedyOracle module

author: Julien Seznec

Oracle and near-minimax policy for rotting bandits without noise.

Reference: [Heidari et al., 2016, https://www.ijcai.org/Proceedings/16/Papers/224.pdf] Tight Policy Regret Bounds for Improving and Decaying Bandits. Hoda Heidari, Michael Kearns, Aaron Roth. International Joint Conference on Artificial Intelligence (IJCAI) 2016, 1562.

class Policies.GreedyOracle.GreedyPolicy(nbArms)[source]

Bases: Policies.IndexPolicy.IndexPolicy

Greedy Policy for rotting bandits (A2 in the reference below). Selects arm with best last value. Reference: [Heidari et al., 2016, https://www.ijcai.org/Proceedings/16/Papers/224.pdf]

__init__(nbArms)[source]

New generic index policy.

  • nbArms: the number of arms,
  • lower, amplitude: lower value and known amplitude of the rewards.
getReward(arm, reward)[source]

Give a reward: increase t, pulls, and update cumulated sum of rewards for that arm (normalized in [0, 1]).

computeAllIndex()[source]

Compute the current indexes for all arms. Possibly vectorized, by default it can not be vectorized automatically.

computeIndex(arm)[source]

Compute the mean of the h last value

startGame()[source]

Initialize the policy for a new game.

__module__ = 'Policies.GreedyOracle'
class Policies.GreedyOracle.GreedyOracle(nbArms, arms)[source]

Bases: Policies.IndexPolicy.IndexPolicy

Greedy Oracle for rotting bandits (A0 in the reference below). Look 1 step forward and select next best value. Optimal policy for rotting bandits problem. Reference: [Heidari et al., 2016, https://www.ijcai.org/Proceedings/16/Papers/224.pdf]

__init__(nbArms, arms)[source]

New generic index policy.

  • nbArms: the number of arms,
  • lower, amplitude: lower value and known amplitude of the rewards.
computeIndex(arm)[source]

Compute the current index of arm ‘arm’.

__module__ = 'Policies.GreedyOracle'