Research code for the paper:
Deconstructing actor-critic: a large-scale empirical study of design components for practitioners
Haseeb Shah, Lingwei Zhu, Adam White, Martha White.
Under review at PNAS.
This study analyzes 33,000+ runs on a control task derived from a real drinking-water treatment plant, varying the lower-level design components of actor-critic algorithms such as the policy objective, policy parameterization, gradient estimator, and actor/critic update schedule to understand how each affects performance, run-to-run variability and hyperparameter sensitivity. The source data from the Drayton Valley water treatment plant in data/.
Python 3.10+ is required (the codebase uses a few 3.10-only features).
pip install -r requirements.txtRuns log to a MySQL server (local, cloud, or HPC). To configure credentials:
- Copy
configs/db/credentials.yamltoconfigs/db/credentials-local.yamland fill in yourusername,password, andip. Files matchingcredentials-*.yamlare gitignored. - In
configs/config.yaml(or whichever config you launch), setargs.db_prefixto your database-user prefix. On shared HPC systems such as the Digital Research Alliance of Canada (formerly Compute Canada), this usually has to match your account name; see their database servers guide. The database need not live on the same machine that runs the experiments: if it is remote, just ensure it accepts connections and provide its IP address in the config.
python run_simulator.py +args.run=0args.run is the primary key for the experiment in the database and must be supplied on the command line. In a multirun sweep, we assign each child run an id of args.run + sweep_index.
To override the agent or policy from the CLI:
python run_simulator.py +args.run=0 agent=ppo policy=betaA sweep can be run using a config under configs/0NN_config_*.yaml. Each one defines a base agent, policy, and a sweeper-params grid. Together with an args.seed range, this defines the Cartesian product of runs:
python run_simulator.py --config-name=005_config_submitit +args.run=0When running several sweeps in parallel against the same database, offset args.run by more than the size of any single sweep so the primary keys do not collide:
python run_simulator.py --config-name=005_config_submitit +args.run=10000 policy=gaussianAlternatively, point each sweep at its own database by changing args.db_name.
The Hydra launcher is joblib by default (multi-process on one machine). For SLURM clusters, switch to the submitit launcher via the corresponding 005_config_submitit*.yaml.
Before launching a sweep on your own cluster, edit the hydra.launcher section of the submitit config you intend to use (for example configs/005_config_submitit.yaml):
accountis set to???. You must supply your SLURM allocation, either by editing the config or overriding it on the command line (hydra.launcher.account=...). On the Digital Research Alliance of Canada, this is something likedef-<pi>orrrg-<pi>.setupis a list of shell commands run before each job. Replacesource /path/to/your/venv/bin/activatewith the path to your own virtual environment, and adjust themodule loadlines (e.g.python/3.10,mariadb) to match the modules available on your cluster.- Job resources such as
timeout_min,mem_per_cpu,cpus_per_task,tasks_per_node,gpus_per_node, andarray_parallelismmay be tuned to fit your allocation and the size of your sweep.
├── run_simulator.py # Entry point
├── agent_policy_factory.py # Builds an (agent, actor, policy) triple from a Hydra config
├── agents/ # Actor-critic agent implementations
│ ├── aclambda_agent.py # Actor-Critic(λ)
│ ├── ddpg_agent.py # DDPG
│ ├── greedyac_agent.py # Greedy-AC
│ ├── mpo_adaptive_agent.py # MPO
│ ├── ppo_agent.py # PPO
│ ├── reinforce_agent.py # REINFORCE
│ ├── sac_agent.py # SAC
│ └── vmpo_agent.py # V-MPO
├── models/
│ ├── actor_critic_nn.py # Critic network
│ ├── tilecoding_mlp.py # Tile-coded MLP
│ └── base_models/ # Policy parameterizations
│ ├── gaussian.py # Gaussian
│ ├── squashed_gaussian.py # tanh-squashed Gaussian
│ ├── student.py # Student-t
│ ├── squashed_student.py # tanh-squashed Student-t
│ ├── beta.py # Beta
│ ├── softmax.py # Discretized softmax
│ └── replay_buffer.py
├── simulators/
│ └── positional_pid_controller.py # Backwashing-PID environment
├── data/ # Source plant data the environment is derived from
├── experiment/ # MySQL logging
├── kl_computation/ # KL utilities used in the mirror-descent variants
├── configs/ # Hydra config tree
│ ├── config.yaml # Root default config
│ ├── 0NN_config_*.yaml # Examples of configs for running different experiments
│ ├── agent/ # Per-agent hyperparameters + sweeper grids
│ ├── policy/ # Per-policy hyperparameters
│ ├── db/ # MySQL credentials templates
│ └── schema/ # Database logging schema
├── notebooks/ # Example plotting notebook
└── requirements.txt