policy_server module¶

Server to play multi-armed bandits problem against.

Usage:: policy_server.py [–port=<PORT>] [–host=<HOST>] [–means=<MEANS>] <json_configuration> policy_server.py (-h|–help) policy_server.py –version
Options:: -h –help Show this screen. –version Show version. –port=<PORT> Port to use for the TCP connection [default: 10000]. –host=<HOST> Address to use for the TCP connection [default: 0.0.0.0]. –means=<MEANS> Means of arms used by the environment, to print regret [default: None].

policy_server.default_configuration = {'archtype': 'UCBalpha', 'nbArms': 10, 'params': {'alpha': 1}}¶: Example of configuration to pass from the command line. '{"nbArms": 3, "archtype": "UCBalpha", "params": { "alpha": 0.5 }}'

policy_server.read_configuration_policy(a_string)[source]¶: Return a valid configuration dictionary to initialize a policy, from the input string.

policy_server.server(policy, host, port, means=None)[source]¶

Launch a server that:

uses sockets to listen to input and reply
create a learning algorithm from a JSON configuration (exactly like main.py when it reads configuration.py)
then receives feedback (arm, reward) from the network, pass it to the algorithm, listens to his arm = choice() suggestion, and sends this back to the network.

policy_server.transform_str(params)[source]¶

Like a safe exec() on a dictionary that can contain special values:

strings are interpreted as variables names (e.g., policy names) from the current globals() scope,
list are transformed to tuples to be constant and hashable,
dictionary are recursively transformed.

Warning

It is still as unsafe as exec() : only use it with trusted inputs!

policy_server.main(args)[source]¶: Take args, construct the learning policy and starts the server.