policy_server module

Server to play multi-armed bandits problem against.

Usage:
policy_server.py [–port=<PORT>] [–host=<HOST>] [–means=<MEANS>] <json_configuration> policy_server.py (-h|–help) policy_server.py –version
Options:
-h –help Show this screen. –version Show version. –port=<PORT> Port to use for the TCP connection [default: 10000]. –host=<HOST> Address to use for the TCP connection [default: 0.0.0.0]. –means=<MEANS> Means of arms used by the environment, to print regret [default: None].
policy_server.default_configuration = {'archtype': 'UCBalpha', 'nbArms': 10, 'params': {'alpha': 1}}

Example of configuration to pass from the command line. '{"nbArms": 3, "archtype": "UCBalpha", "params": { "alpha": 0.5 }}'

policy_server.read_configuration_policy(a_string)[source]

Return a valid configuration dictionary to initialize a policy, from the input string.

policy_server.server(policy, host, port, means=None)[source]

Launch a server that:

  • uses sockets to listen to input and reply
  • create a learning algorithm from a JSON configuration (exactly like main.py when it reads configuration.py)
  • then receives feedback (arm, reward) from the network, pass it to the algorithm, listens to his arm = choice() suggestion, and sends this back to the network.
policy_server.transform_str(params)[source]

Like a safe exec() on a dictionary that can contain special values:

  • strings are interpreted as variables names (e.g., policy names) from the current globals() scope,
  • list are transformed to tuples to be constant and hashable,
  • dictionary are recursively transformed.

Warning

It is still as unsafe as exec() : only use it with trusted inputs!

policy_server.main(args)[source]

Take args, construct the learning policy and starts the server.