Adversarial and robust PPO workflows for AdvORAN.
Here we have codes for train an EM-MAX agent for traffic profile 1, train an agent robustly with the help of adversarial learned perturbator in the training loop.
This repo saved_policy/em-max/em-agent-lp as the default victim policy.
| Path | Purpose |
|---|---|
dataset/ |
Input CSV files used by the environment and reward-model training. Download and Save csv files under dataset folder Download |
saved_policy/em-max/em-agent-lp/ |
Victim PPO policy snapshots (actor.npz, value.npz, optimizer.npz) |
saved_policy/em-max/em-adversarial-agent/ |
Reference adversarial policy used by the perturbator workflow |
saved_policy/em-max/em-agent-lp-robust/ |
Default output folder for robust PPO training |
requirements/ |
Dependency manifests (current.txt and legacy-tf23.txt) |
utils/ |
Small helper scripts such as run_actor_npz.py |
| Script | Role |
|---|---|
train_modular.py |
Train or refresh the victim PPO policy |
reward_model.py |
Train reward_model.h5 from a CSV file |
training_adversarial_policy.py |
Train the adversarial/reference PPO policy using reward_model.h5 |
train_perturbator_policy.py |
Train the observation perturbator and save pert.h5 |
train_robust_policy.py |
Train the robust PPO policy starting from em-agent-lp |
attack_wa.py |
Run a PGD-style attack against the victim actor |
evaluate_action_net.py |
Inspect the victim actor/action net behavior |
evaluate_perturbator_effect.py |
Measure how pert.h5 changes the victim policy |
utils/run_actor_npz.py |
Step through a saved actor.npz snapshot in the environment |
Two dependency tracks are kept on purpose:
requirements/current.txt: current loose stack for the newer code pathrequirements/legacy-tf23.txt: reproducible legacy stack for the TensorFlow 2.3 experiments and saved checkpoints
The exact package set below is not installable as written:
cloudpickle==1.4.1
numpy<1.19.0
pandas
scipy==1.4.1
setuptools>=41.0.0
tensorflow==2.3
tensorflow-probability==0.7
tf_agents
Use requirements/legacy-tf23.txt instead. The corrected legacy stack is:
- Python
3.8.x tensorflow==2.3.0tensorflow-probability==0.11.0tf-agents==0.6.0
The repo already includes this corrected stack in requirements/legacy-tf23.txt.
Run every command below from the repo root.
No extra PYTHONPATH setup is required.
For the legacy TensorFlow 2.3 workflow:
python3.8 -m venv .venv-tf23
source .venv-tf23/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements/legacy-tf23.txtIf you want the current loose stack instead:
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txtMake sure these files exist before running training or evaluation:
ls dataset/embb_filtered.csv
ls dataset/colosseum_oran_coloran_ue2_003.csv
ls encoder.h5
ls saved_policy/em-max/em-agent-lp/actor.npz
ls saved_policy/em-max/em-agent-lp/value.npz
ls reward_model.h5
ls pert.h5Notes:
dataset/embb_filtered.csvis the environment dataset used byconfig_em_filtered.pydataset/colosseum_oran_coloran_ue2_003.csvis the CSV used to rebuild the reward modelsaved_policy/em-max/em-agent-lp/is the default victim policy directory
Run the saved victim actor directly:
python utils/run_actor_npz.py \
saved_policy/em-max/em-agent-lp/actor.npz \
--steps 10 \
--action_mode greedyInspect the action net in more detail:
python evaluate_action_net.py \
saved_policy/em-max/em-agent-lp \
--collect_steps 200 \
--eval_episodes 5 \
--eval_max_steps 10Worst action attack against the victim policy:
python attack_wa.py \
--policy_dir saved_policy/em-max/em-agent-lp \
--eps 0.3 \
--alpha 0.015 \
--iters 20 \
--horizon 10Evaluate the learned perturbator on the victim policy:
python evaluate_perturbator_effect.py \
--policy_dir saved_policy/em-max/em-agent-lp \
--perturbator_path pert.h5 \
--collect_steps 200 \
--eval_episodes 5 \
--eval_max_steps 10This writes canonical snapshots into saved_policy/em-max/em-agent-lp/.
python train_modular.py --cpu_onlypython reward_model.py \
--csv dataset/colosseum_oran_coloran_ue2_003.csv \
--out_model reward_model.h5This writes to saved_policy/em-max/em-adversarial-agent/.
python training_adversarial_policy.py \
--reward_model_path reward_model.h5 \
--cpu_onlyVictim policy = em-agent-lp, reference policy = em-adversarial-agent.
python train_perturbator_policy.py \
--policy_dir saved_policy/em-max/em-agent-lp \
--ref_policy_dir saved_policy/em-max/em-adversarial-agent \
--reward_model reward_model.h5 \
--save_perturb_net pert.h5This starts from saved_policy/em-max/em-agent-lp/ and writes to saved_policy/em-max/em-agent-lp-robust/.
python train_robust_policy.py \
--policy_dir saved_policy/em-max/em-agent-lp \
--perturbator_path pert.h5 \
--reward_model_path reward_model.h5 \
--cpu_onlyEvaluate the original victim again:
python evaluate_action_net.py saved_policy/em-max/em-agent-lpEvaluate the robust policy:
python evaluate_action_net.py saved_policy/em-max/em-agent-lp-robustCompare the perturbator effect on the victim:
python evaluate_perturbator_effect.py \
--policy_dir saved_policy/em-max/em-agent-lp \
--perturbator_path pert.h5| Output | Default location |
|---|---|
| Victim PPO snapshots | saved_policy/em-max/em-agent-lp/ |
| Adversarial/reference PPO snapshots | saved_policy/em-max/em-adversarial-agent/ |
| Robust PPO snapshots | saved_policy/em-max/em-agent-lp-robust/ |
| Reward model | reward_model.h5 |
| Perturbator | pert.h5 |
If you already trust the bundled victim policy and only want to run the attack pipeline, use this order:
- Create the Python 3.8 environment with
requirements/legacy-tf23.txt - Run
utils/run_actor_npz.pyorevaluate_action_net.pyonsaved_policy/em-max/em-agent-lp - Run
attack_wa.py - Run
evaluate_perturbator_effect.py
If you want to rebuild everything from scratch, use this order:
train_modular.pyreward_model.pytraining_adversarial_policy.pytrain_perturbator_policy.pytrain_robust_policy.py