The first LLM-agent simulation of the academic peer review process.
Yiqiao Jin1* · Qinlin Zhao2* · Yiyang Wang1 · Hao Chen3 · Kaijie Zhu4 · Yijia Xiao5 · Jindong Wang6
1Georgia Institute of Technology
2University of Science and Technology of China
3Carnegie Mellon University
4UC Santa Barbara
5UC Los Angeles
6William & Mary
* Equal contribution.
| Metric | Value |
|---|---|
| Total generated peer review documents | 53,800+ |
| Reviews & rebuttals | 10,460 |
| Reviewer–AC discussions | 23,535 |
| Meta-reviews / final decisions | 9,414 / 9,414 |
| Conferences covered | ICLR 2020 – 2023 |
| Submissions sampled (oral · spotlight · poster · reject) | 523 (19 · 29 · 125 · 350) |
| Decision variation attributable to reviewer bias | 37.1 % |
Peer review is fundamental to the integrity and advancement of scientific publication. Traditional methods of peer review analyses often rely on exploration and statistics of existing peer review data, which do not adequately address the multivariate nature of the process, account for the latent variables, and are further constrained by privacy concerns due to the sensitive nature of the data. We introduce AgentReview, the first large language model (LLM) based peer review simulation framework, which effectively disentangles the impacts of multiple latent factors and addresses the privacy issue. Our study reveals significant insights, including a notable 37.1 % variation in paper decisions due to reviewers' biases, supported by sociological theories such as the social influence theory, altruism fatigue, and authority bias. We believe that this study could offer valuable insights to improve the design of peer review mechanisms.
AgentReview models peer review as a structured five-phase pipeline with three role types — reviewers, authors, and area chairs — each instantiated as an LLM agent with configurable latent traits. By varying one trait at a time against a fixed baseline setting, the framework disentangles otherwise-confounded factors such as reviewer commitment, intention, knowledgeability, AC style, and author anonymity, while preserving real reviewer privacy.
📑 Table of Contents
Five sociological phenomena emerge from the simulation, each tied to a measurable shift in review outcomes:
| Phenomenon | Sociological theory | Quantitative effect |
|---|---|---|
| Social Influence | Conformity to perceived majority opinion | −27.2 % standard deviation in ratings after rebuttals |
| Altruism Fatigue & Peer Effects | One free-rider triggers collective disengagement | A single under-committed reviewer drives a −18.7 % drop in commitment across all reviewers |
| Groupthink & Echo Chamber | Amplification of negative views among biased peers | −0.17 rating among biased reviewers, plus a −0.25 spillover on unbiased reviewers |
| Authority Bias & Halo Effect | Renowned-author identity inflates perceived quality | Revealing identity for just 10 % of papers shifts 27.7 % of decisions |
| Anchoring Bias | Heavy reliance on initial impressions | The rebuttal phase exerts only a minimal effect on final outcomes |
Three LLM-agent roles are configured along orthogonal trait axes, all set via prompts:
| Role | Trait axis | Variants |
|---|---|---|
| Reviewer | Commitment | responsible · irresponsible |
| Intention | benign · malicious | |
| Knowledgeability | knowledgeable · unknowledgeable | |
| Author | Identity disclosure | anonymous · known |
| Area Chair | Decision style | authoritarian · conformist · inclusive |
| Phase | Stage | What happens |
|---|---|---|
| I | Reviewer Assessment | Three reviewers independently evaluate each manuscript |
| II | Author–Reviewer Discussion | Authors submit rebuttals addressing reviewer concerns |
| III | Reviewer–AC Discussion | The AC facilitates discussion; reviewers update their initial ratings |
| IV | Meta-Review Compilation | The AC synthesizes all signals into a single meta-review |
| V | Paper Decision | The AC makes the final accept / reject call (fixed acceptance rate of 32 %) |
| Requirement | Version |
|---|---|
| Python | 3.10+ |
| LLM access | OpenAI or Azure OpenAI API key |
| OS | Linux / macOS / WSL |
git clone https://github.com/Ahren09/AgentReview.git
cd AgentReview
pip install -r requirements.txt🔑 Set environment variables — OpenAI
export OPENAI_API_KEY=sk-...🔑 Set environment variables — Azure OpenAI
export AZURE_ENDPOINT=https://<your-endpoint>.openai.azure.com/
export AZURE_DEPLOYMENT=<your-deployment-name>
export AZURE_OPENAI_KEY=<your-key>Two zip archives are hosted on Dropbox:
| Archive | Contents | Target | Required? |
|---|---|---|---|
AgentReview_Paper_Data.zip |
PDFs of sampled ICLR papers + real ICLR 2020–2023 reviews | data/ |
✅ |
AgentReview_LLM_Reviews.zip |
The full LLM-generated review dataset from the paper | outputs/ |
optional |
unzip AgentReview_Paper_Data.zip -d data/
unzip AgentReview_LLM_Reviews.zip -d outputs/ # optionalRun a full simulated review pass on ICLR 2024 with a malicious_Rx1 reviewer setting:
python run_paper_review_cli.py \
--conference ICLR2024 \
--openai_client_type azure_openai \
--data_dir data \
--experiment_name malicious_Rx1Or explore interactively:
- Notebook —
notebooks/demo.ipynb - Live demo — Hugging Face Space
- End-to-end script —
run.sh
Note: all project files should be run from the
AgentReviewdirectory.
Define a new setting in agentreview/experiment_config.py and register it in all_settings:
all_settings = {
"BASELINE": baseline_setting,
"benign_Rx1": benign_Rx1_setting,
# ...
"your_setting_name": your_setting,
}- We use a fixed acceptance rate of 32 %, matching the actual ICLR 2020–2023 average. See Conference Acceptance Rates for context.
- API providers can apply strict content filtering. You may need to relax filtering on your deployment to obtain complete generations.
@inproceedings{jin2024agentreview,
title = {AgentReview: Exploring Peer Review Dynamics with LLM Agents},
author = {Jin, Yiqiao and Zhao, Qinlin and Wang, Yiyang and Chen, Hao
and Zhu, Kaijie and Xiao, Yijia and Wang, Jindong},
booktitle = {Proceedings of the 2024 Conference on Empirical Methods in
Natural Language Processing (EMNLP)},
year = {2024}
}The implementation builds on the chatarena multi-agent framework, and uses the OpenReview API to retrieve real ICLR submission data.
Released under the Apache License 2.0.

