Policies.GreedyOracle module¶

author: Julien Seznec

Oracle and near-minimax policy for rotting bandits without noise.

Reference: [Heidari et al., 2016, https://www.ijcai.org/Proceedings/16/Papers/224.pdf] Tight Policy Regret Bounds for Improving and Decaying Bandits. Hoda Heidari, Michael Kearns, Aaron Roth. International Joint Conference on Artificial Intelligence (IJCAI) 2016, 1562.

class Policies.GreedyOracle.GreedyPolicy(nbArms)[source]¶

Bases: Policies.IndexPolicy.IndexPolicy

Greedy Policy for rotting bandits (A2 in the reference below). Selects arm with best last value. Reference: [Heidari et al., 2016, https://www.ijcai.org/Proceedings/16/Papers/224.pdf]

__init__(nbArms)[source]¶

New generic index policy.

nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.

getReward(arm, reward)[source]¶: Give a reward: increase t, pulls, and update cumulated sum of rewards for that arm (normalized in [0, 1]).

computeAllIndex()[source]¶: Compute the current indexes for all arms. Possibly vectorized, by default it can not be vectorized automatically.

computeIndex(arm)[source]¶: Compute the mean of the h last value

startGame()[source]¶: Initialize the policy for a new game.

__module__ = 'Policies.GreedyOracle'¶

class Policies.GreedyOracle.GreedyOracle(nbArms, arms)[source]¶

Bases: Policies.IndexPolicy.IndexPolicy

Greedy Oracle for rotting bandits (A0 in the reference below). Look 1 step forward and select next best value. Optimal policy for rotting bandits problem. Reference: [Heidari et al., 2016, https://www.ijcai.org/Proceedings/16/Papers/224.pdf]

__init__(nbArms, arms)[source]¶

New generic index policy.

nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.

computeIndex(arm)[source]¶: Compute the current index of arm ‘arm’.

__module__ = 'Policies.GreedyOracle'¶