Skip to content

canyuchen/FedAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

27 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

FedAgent logo

FedAgent: A Library for Decentralized Agent Learning

Train LLM agents collaboratively across decentralized clients, without sharing local data.

Homepage Paper (PDF) License: Apache 2.0


Updates


Overview

FedAgent is a library for federated RL training of LLM agents. It implements a federated training server with FedAvg aggregation (plus optional client-side FedProx), a two-level heterogeneity suite (task vs environment partitioning), and federated PPO/GRPO trainers built on verl-agent. You can reproduce the paper's experiments or extend the framework with your own datasets, environments, and algorithms.

FedAgent is also the reference implementation for the paper, which formalizes agent heterogeneity at two structurally distinct levels (task vs environment) and derives an asymmetric robustness result: federated training is robust to task-level heterogeneity but worst-case non-robust to environment-level heterogeneity. See docs/heterogeneity.md for the full construction.


Key Features

  • Federated PPO and GRPO trainers β€” drop-in federated counterparts of the verl-agent trainers; swap one config to go from single-client to federated
  • Two-level heterogeneity suite β€” task-level (Preference / Coverage / Hardness) and environment-level (5 WebShop transition variants), the first systematic decomposition for agent FL
  • FedAvg aggregation with FSDP-sharded model support, pluggable for custom rules, plus optional client-side FedProx (a proximal term added to local training, not a server rule)
  • Fully configurable federation protocol (clients N, clients/round M, local epochs E, rounds T, tasks/client |Xα΅’|) with ready-made sweeps
  • Any HuggingFace backbone (paper uses Qwen2.5-1.5B/3B/7B-Instruct, Llama-3.2-3B-Instruct); WebShop and ALFWorld benchmarks out of the box
  • FSDP sharding, single-GPU to multi-node, SLURM / torchrun launch paths

Clients can run serially or in parallel across GPUs; the library is W&B-free (metrics go to JSON / console) and exposes extension points for new datasets, environments, heterogeneity strategies, and aggregation rules (see docs/extending.md). Full details in docs/features.md.


Repository layout

fedagent/
β”œβ”€β”€ README.md                  # this file
β”œβ”€β”€ LICENSE                    # Apache-2.0
β”œβ”€β”€ NOTICE                     # third-party attributions
β”œβ”€β”€ CITATION.cff               # how to cite (TODO: finalize once published)
β”œβ”€β”€ reproduce.sh               # one-command reproduction entry point
β”œβ”€β”€ evaluate.sh                # evaluate a trained checkpoint + collect trajectories
β”œβ”€β”€ download_data.sh           # fetch WebShop / ALFWorld data (not shipped)
β”œβ”€β”€ .env.example               # optional environment variables (W&B removed)
β”œβ”€β”€ .gitignore
β”œβ”€β”€ core/                      # federated server + aggregation + trainers (contribution)
β”œβ”€β”€ utils/                     # model aggregation (FedAvg, incl. FSDP)
β”œβ”€β”€ tools/                     # run_federated.py, resolve_paths.py, checkpoint monitor
β”‚   └── aggregation/           # aggregation verification / diagnostic toolbox
β”œβ”€β”€ scripts/                   # setup_env.sh, runners, verl-agent base launch scripts
β”œβ”€β”€ config/                    # curated experiment configs (W&B stripped)
β”‚   β”œβ”€β”€ paths.yaml.example     # path template consumed by tools/resolve_paths.py
β”‚   └── example.yaml           # fully annotated example config
β”œβ”€β”€ docs/                      # user-facing documentation (see below)
β”œβ”€β”€ eval/                      # checkpoint evaluation + trajectory collection
└── third_party/
    └── verl-agent/            # vendored upstream (Apache-2.0), no bundled 5.6 GB data

FedAgent code map

FedAgent is a framework extension, so first-party code spans two layers: a top-level control plane and in-framework hooks that live inside the vendored tree (verl-agent imports/runs them). Everything else under third_party/verl-agent/ is unmodified upstream (Apache-2.0). Per-file detail: docs/ARCHITECTURE.md; exhaustive edit list: CHANGES.md.

fedagent/                                       ── first-party (this work) ──
β”œβ”€β”€ core/                 control plane: federated server, round orchestration, aggregation
β”œβ”€β”€ utils/                model aggregation (FedAvg, incl. FSDP)
β”œβ”€β”€ tools/                run_federated.py, resolve_paths.py, aggregation/, env_heterogeneity/, heterogeneity_test/, monitor/
β”œβ”€β”€ eval/                 checkpoint evaluation + trajectory collection
β”œβ”€β”€ scripts/              setup_env.sh, federated runners, verl-agent launchers, plotting/
β”œβ”€β”€ config/, docs/        experiment configs (W&B stripped) + documentation
β”‚
└── third_party/verl-agent/    ── vendored upstream (Apache-2.0); our hooks woven in ──
    β”œβ”€β”€ agent_system/environments/partition_strategy.py        core heterogeneity constructions
    β”œβ”€β”€ agent_system/environments/fed_env_manager.py           federated env managers
    β”œβ”€β”€ verl/trainer/main_ppo_fed.py                           federated PPO/GRPO entry point
    β”œβ”€β”€ verl/trainer/ppo/ray_trainer_fed.py                    Ray federated trainer
    β”œβ”€β”€ verl/utils/checkpoint/fsdp_checkpoint_manager_fed.py   federated checkpoint manager
    └── verl/utils/tracking_fed.py                             per-round / per-client tracking

Installation

FedAgent runs on Python 3.10. WebShop and ALFWorld have conflicting dependencies (WebShop needs a Java/Lucene search stack via pyserini/pyjnius; ALFWorld needs the TextWorld + Fast-Downward planning stack). Following verl-agent's own guidance, each benchmark gets its own conda env:

# WebShop  -> conda env `fedagent-webshop` (Python 3.10), incl. vendored verl-agent
bash scripts/setup_env.sh create webshop
conda activate fedagent-webshop

# ALFWorld -> conda env `fedagent-alfworld`
bash scripts/setup_env.sh create alfworld
conda activate fedagent-alfworld
alfworld-download -f          # one-time: PDDL + game files -> ~/.cache/alfworld/

# Path template (both envs)
cp config/paths.yaml.example config/paths.yaml && $EDITOR config/paths.yaml

WebShop additionally needs a JDK on PATH (for pyserini). Each reproduce.sh / evaluate.sh run must happen inside the matching env. Full step-by-step setup (both envs, data, and the upstream env packages) is in docs/installation.md.

W&B logging is removed from this release, no tracking account or key needed.


Data

The default configs run out of the box: the three small WebShop catalog files (items_shuffle_1000.json, items_ins_v2_1000.json, items_human_ins.json, backing webshop.use_small: true) are already shipped in the repo, where the WebShop env loads them from third_party/verl-agent/agent_system/environments/env_package/webshop/webshop/data/ (not the top-level data/, which ships only a README).

Two things are fetched separately:

bash download_data.sh           # ALFWorld game files (auto) + WebShop full-catalog instructions
  • ALFWorld game files: auto-downloaded by the script (alfworld-download) to ~/.cache/alfworld, where the env reads them.
  • WebShop full catalog (items_shuffle.json ~5.2 GB + items_ins_v2.json), needed only for full-scale webshop.use_small: false runs, and fetched manually: the script prints instructions to download them from princeton-nlp/WebShop into the same WebShop data/ directory. The shipped small files already reproduce the paper's WebShop results. See docs/configuration.md for the use_small switch.

Models

Backbones are HuggingFace model ids (default Qwen/Qwen2.5-1.5B-Instruct) and auto-download on first run to ~/.cache/huggingface (set HF_HOME to relocate; ~3 GB for 1.5B up to ~15 GB for 7B). Two caveats: the main table's Llama-3.2-3B-Instruct is gated: accept its HuggingFace license and huggingface-cli login (or set HF_TOKEN) first; and on offline / air-gapped clusters pre-fetch on a login node and set HF_HUB_OFFLINE=1. See docs/installation.md for details.


Quick Start

Run a FedAgent experiment directly with the federated runner: give it a config name (its path under config/, without the .yaml) and a round count, from the repository root inside the matching conda env.

# WebShop main run, 70 rounds. The config sets the backbone, GPU count, and protocol:
python tools/run_federated.py --restart-resume \
  uniform/Qwen2.5-1.5B-Instruct/main/grpo/fed_webshop_grpo_total-100_cl-per-rd-2_rd-70_ep-per-cl-3_min-goals-per-cl-100_p-uniform 70

The runner resolves the config, creates the run's ./output/ directory, and launches per-client training; it is re-runnable and resumes where it left off. Hardware (GPU count, FSDP offload, serial vs parallel clients) is read from the config (verl.trainer.*, federated.training.*); to change it, edit those keys or follow the running guide, which also documents the lower-level launcher scripts/start_federated.sh.

Evaluate a trained checkpoint and collect trajectories:

bash evaluate.sh webshop /path/to/checkpoint

Trained checkpoints are saved as FSDP shards; evaluate.sh merges them to HuggingFace format on first use (see eval/README.md).


Reproducing the paper

To reproduce the paper, reproduce.sh wraps the runner with named experiments and hardware flags: it resolves the canonical config, applies any overrides, and launches it.

bash reproduce.sh webshop-main                  # WebShop main table, GRPO, 4 GPUs
bash reproduce.sh alfworld-main --single-gpu    # ALFWorld main, 1-GPU debug run
bash reproduce.sh webshop-main --mode serial    # clients run one at a time
bash reproduce.sh webshop-main --slurm          # submit via SLURM (cluster)

The full guide is in docs/reproducing.md: every table and figure mapped to its config directory, with run commands, seeds, and compute estimates (~1,800 H100 GPU-hours total). It covers the main table (Local / Centralized / FedAgent across four backbones Γ— WebShop + ALFWorld), the task- and environment-level heterogeneity studies, and the decentralized ablations.


Documentation

Doc Contents
docs/features.md Key features in depth: the config keys, flags, and files behind each headline capability.
docs/installation.md Two-conda-env setup (WebShop vs ALFWorld), full step-by-step, JDK / game-file notes.
docs/running.md Running FedAgent: the run-mode matrix (parallel vs serial, FSDP on/off, single-GPU, variable GPU count, multi-node, SLURM), flag-to-knob table, and worked examples.
docs/reproducing.md Per-experiment reproduction recipes, compute estimates, and seeds.
docs/heterogeneity.md The two-level heterogeneity taxonomy and how to construct/select each variant.
docs/configuration.md Config filename decoder and field reference for the federated: and verl: blocks.
docs/extending.md Extension points: new dataset/env, new heterogeneity strategy, new RL algorithm, new aggregation strategy.

Citation

If you use FedAgent in your research, please cite:

@article{fedagent2026,
  title   = {Is Decentralized LLM Agent RL Robust to Heterogeneity? An Asymmetric Tale},
  author  = {Chen, Canyu and Zhu, Kangyu and Chen, Zhaorun and Zhou, Zhanhui and Diao, Shizhe and Lu, Yiping and Li, Tian and Li, Manling and Song, Dawn},
  journal = {arXiv preprint arXiv:},
  year    = {2026}
}

License

This project is released under the Apache License 2.0: see LICENSE.

Acknowledgements

FedAgent builds on a vendored, modified fork of verl-agent, which itself extends veRL. We gratefully acknowledge:

  • veRL: Β© ByteDance / the veRL authors (Apache-2.0): the base RL training framework. https://github.com/volcengine/verl
  • verl-agent / GiGPO: Feng et al., Group-in-Group Policy Optimization for LLM Agent Reinforcement Learning (arXiv:2505.10978): the agent-RL fork FedAgent is built on. https://github.com/langfengQ/verl-agent
  • WebShop: Yao et al., Princeton NLP (MIT License): the e-commerce agent benchmark.
  • ALFWorld: Shridhar et al., Microsoft Research (MIT License): the embodied household agent benchmark.

Full per-component attributions and license texts are aggregated in NOTICE.

About

FedAgent: A Library for Decentralized Agent Learning. πŸ† Best Paper Award at the AAAI'26 TrustAgent Workshop, πŸ† Outstanding Paper Award in the AAAI'26 PerFM Workshop.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors