🎓 AgentReview

The first LLM-agent simulation of the academic peer review process.

Yiqiao Jin^1* · Qinlin Zhao^2* · Yiyang Wang¹ · Hao Chen³ · Kaijie Zhu⁴ · Yijia Xiao⁵ · Jindong Wang⁶

¹Georgia Institute of Technology ²University of Science and Technology of China ³Carnegie Mellon University
⁴UC Santa Barbara ⁵UC Los Angeles ⁶William & Mary

_{^* Equal contribution.}

📊 At a glance

Metric	Value
Total generated peer review documents	53,800+
Reviews & rebuttals	10,460
Reviewer–AC discussions	23,535
Meta-reviews / final decisions	9,414 / 9,414
Conferences covered	ICLR 2020 – 2023
Submissions sampled (oral · spotlight · poster · reject)	523 (19 · 29 · 125 · 350)
Decision variation attributable to reviewer bias	37.1 %

📝 Abstract

Peer review is fundamental to the integrity and advancement of scientific publication. Traditional methods of peer review analyses often rely on exploration and statistics of existing peer review data, which do not adequately address the multivariate nature of the process, account for the latent variables, and are further constrained by privacy concerns due to the sensitive nature of the data. We introduce AgentReview, the first large language model (LLM) based peer review simulation framework, which effectively disentangles the impacts of multiple latent factors and addresses the privacy issue. Our study reveals significant insights, including a notable 37.1 % variation in paper decisions due to reviewers' biases, supported by sociological theories such as the social influence theory, altruism fatigue, and authority bias. We believe that this study could offer valuable insights to improve the design of peer review mechanisms.

🏗️ Architecture

AgentReview models peer review as a structured five-phase pipeline with three role types — reviewers, authors, and area chairs — each instantiated as an LLM agent with configurable latent traits. By varying one trait at a time against a fixed baseline setting, the framework disentangles otherwise-confounded factors such as reviewer commitment, intention, knowledgeability, AC style, and author anonymity, while preserving real reviewer privacy.

📑 Table of Contents

Key findings
Framework
- Roles
- Five-phase pipeline
Installation
Data setup
Quick start
Customizing your own setting
Notes
Citation
Acknowledgments
License

✨ Key findings

Five sociological phenomena emerge from the simulation, each tied to a measurable shift in review outcomes:

Phenomenon	Sociological theory	Quantitative effect
Social Influence	Conformity to perceived majority opinion	−27.2 % standard deviation in ratings after rebuttals
Altruism Fatigue & Peer Effects	One free-rider triggers collective disengagement	A single under-committed reviewer drives a −18.7 % drop in commitment across all reviewers
Groupthink & Echo Chamber	Amplification of negative views among biased peers	−0.17 rating among biased reviewers, plus a −0.25 spillover on unbiased reviewers
Authority Bias & Halo Effect	Renowned-author identity inflates perceived quality	Revealing identity for just 10 % of papers shifts 27.7 % of decisions
Anchoring Bias	Heavy reliance on initial impressions	The rebuttal phase exerts only a minimal effect on final outcomes

🔬 Framework

Roles

Three LLM-agent roles are configured along orthogonal trait axes, all set via prompts:

Role	Trait axis	Variants
Reviewer	Commitment	responsible · irresponsible
	Intention	benign · malicious
	Knowledgeability	knowledgeable · unknowledgeable
Author	Identity disclosure	anonymous · known
Area Chair	Decision style	authoritarian · conformist · inclusive

Five-phase pipeline

Phase	Stage	What happens
I	Reviewer Assessment	Three reviewers independently evaluate each manuscript
II	Author–Reviewer Discussion	Authors submit rebuttals addressing reviewer concerns
III	Reviewer–AC Discussion	The AC facilitates discussion; reviewers update their initial ratings
IV	Meta-Review Compilation	The AC synthesizes all signals into a single meta-review
V	Paper Decision	The AC makes the final accept / reject call (fixed acceptance rate of 32 %)

📦 Installation

Requirement	Version
Python	3.10+
LLM access	OpenAI or Azure OpenAI API key
OS	Linux / macOS / WSL

git clone https://github.com/Ahren09/AgentReview.git
cd AgentReview
pip install -r requirements.txt

🔑 Set environment variables — OpenAI

export OPENAI_API_KEY=sk-...

🔑 Set environment variables — Azure OpenAI

export AZURE_ENDPOINT=https://<your-endpoint>.openai.azure.com/
export AZURE_DEPLOYMENT=<your-deployment-name>
export AZURE_OPENAI_KEY=<your-key>

💾 Data setup

Two zip archives are hosted on Dropbox:

Archive	Contents	Target	Required?
`AgentReview_Paper_Data.zip`	PDFs of sampled ICLR papers + real ICLR 2020–2023 reviews	`data/`	✅
`AgentReview_LLM_Reviews.zip`	The full LLM-generated review dataset from the paper	`outputs/`	optional

unzip AgentReview_Paper_Data.zip   -d data/
unzip AgentReview_LLM_Reviews.zip  -d outputs/    # optional

🚀 Quick start

Run a full simulated review pass on ICLR 2024 with a malicious_Rx1 reviewer setting:

python run_paper_review_cli.py \
    --conference ICLR2024 \
    --openai_client_type azure_openai \
    --data_dir data \
    --experiment_name malicious_Rx1

Or explore interactively:

Notebook — notebooks/demo.ipynb
Live demo — Hugging Face Space
End-to-end script — run.sh

Note: all project files should be run from the AgentReview directory.

🛠️ Customizing your own setting

Define a new setting in agentreview/experiment_config.py and register it in all_settings:

all_settings = {
    "BASELINE":   baseline_setting,
    "benign_Rx1": benign_Rx1_setting,
    # ...
    "your_setting_name": your_setting,
}

📌 Notes

We use a fixed acceptance rate of 32 %, matching the actual ICLR 2020–2023 average. See Conference Acceptance Rates for context.
API providers can apply strict content filtering. You may need to relax filtering on your deployment to obtain complete generations.

📚 Citation

@inproceedings{jin2024agentreview,
  title     = {AgentReview: Exploring Peer Review Dynamics with LLM Agents},
  author    = {Jin, Yiqiao and Zhao, Qinlin and Wang, Yiyang and Chen, Hao
               and Zhu, Kaijie and Xiao, Yijia and Wang, Jindong},
  booktitle = {Proceedings of the 2024 Conference on Empirical Methods in
               Natural Language Processing (EMNLP)},
  year      = {2024}
}

🤝 Acknowledgments

The implementation builds on the chatarena multi-agent framework, and uses the OpenReview API to retrieve real ICLR submission data.

⚖️ License

Released under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
agentreview		agentreview
docs		docs
notebooks		notebooks
review_content_analysis		review_content_analysis
static/img		static/img
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
run.sh		run.sh
run_paper_decision_cli.py		run_paper_decision_cli.py
run_paper_review_cli.py		run_paper_review_cli.py
setup.py		setup.py
template.py		template.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎓 AgentReview

📊 At a glance

📝 Abstract

🏗️ Architecture

✨ Key findings

🔬 Framework

Roles

Five-phase pipeline

📦 Installation

💾 Data setup

🚀 Quick start

🛠️ Customizing your own setting

📌 Notes

📚 Citation

🤝 Acknowledgments

⚖️ License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎓 AgentReview

📊 At a glance

📝 Abstract

🏗️ Architecture

✨ Key findings

🔬 Framework

Roles

Five-phase pipeline

📦 Installation

💾 Data setup

🚀 Quick start

🛠️ Customizing your own setting

📌 Notes

📚 Citation

🤝 Acknowledgments

⚖️ License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages