Skip to content

Ahren09/AgentReview

Repository files navigation

🎓 AgentReview

The first LLM-agent simulation of the academic peer review process.

EMNLP 2024 Oral arXiv ACL Anthology HF Demo Website Code Python Gradio License

AgentReview overview

Yiqiao Jin1* · Qinlin Zhao2* · Yiyang Wang1 · Hao Chen3 · Kaijie Zhu4 · Yijia Xiao5 · Jindong Wang6

1Georgia Institute of Technology   2University of Science and Technology of China   3Carnegie Mellon University  
4UC Santa Barbara   5UC Los Angeles   6William & Mary

* Equal contribution.


📊 At a glance

Metric Value
Total generated peer review documents 53,800+
Reviews & rebuttals 10,460
Reviewer–AC discussions 23,535
Meta-reviews / final decisions 9,414 / 9,414
Conferences covered ICLR 2020 – 2023
Submissions sampled (oral · spotlight · poster · reject) 523  (19 · 29 · 125 · 350)
Decision variation attributable to reviewer bias 37.1 %

📝 Abstract

Peer review is fundamental to the integrity and advancement of scientific publication. Traditional methods of peer review analyses often rely on exploration and statistics of existing peer review data, which do not adequately address the multivariate nature of the process, account for the latent variables, and are further constrained by privacy concerns due to the sensitive nature of the data. We introduce AgentReview, the first large language model (LLM) based peer review simulation framework, which effectively disentangles the impacts of multiple latent factors and addresses the privacy issue. Our study reveals significant insights, including a notable 37.1 % variation in paper decisions due to reviewers' biases, supported by sociological theories such as the social influence theory, altruism fatigue, and authority bias. We believe that this study could offer valuable insights to improve the design of peer review mechanisms.


🏗️ Architecture

AgentReview 5-phase pipeline

AgentReview models peer review as a structured five-phase pipeline with three role types — reviewers, authors, and area chairs — each instantiated as an LLM agent with configurable latent traits. By varying one trait at a time against a fixed baseline setting, the framework disentangles otherwise-confounded factors such as reviewer commitment, intention, knowledgeability, AC style, and author anonymity, while preserving real reviewer privacy.


📑 Table of Contents

✨ Key findings

Five sociological phenomena emerge from the simulation, each tied to a measurable shift in review outcomes:

Phenomenon Sociological theory Quantitative effect
Social Influence Conformity to perceived majority opinion −27.2 % standard deviation in ratings after rebuttals
Altruism Fatigue & Peer Effects One free-rider triggers collective disengagement A single under-committed reviewer drives a −18.7 % drop in commitment across all reviewers
Groupthink & Echo Chamber Amplification of negative views among biased peers −0.17 rating among biased reviewers, plus a −0.25 spillover on unbiased reviewers
Authority Bias & Halo Effect Renowned-author identity inflates perceived quality Revealing identity for just 10 % of papers shifts 27.7 % of decisions
Anchoring Bias Heavy reliance on initial impressions The rebuttal phase exerts only a minimal effect on final outcomes

🔬 Framework

Roles

Three LLM-agent roles are configured along orthogonal trait axes, all set via prompts:

Role Trait axis Variants
Reviewer Commitment responsible · irresponsible
Intention benign · malicious
Knowledgeability knowledgeable · unknowledgeable
Author Identity disclosure anonymous · known
Area Chair Decision style authoritarian · conformist · inclusive

Five-phase pipeline

Phase Stage What happens
I Reviewer Assessment Three reviewers independently evaluate each manuscript
II Author–Reviewer Discussion Authors submit rebuttals addressing reviewer concerns
III Reviewer–AC Discussion The AC facilitates discussion; reviewers update their initial ratings
IV Meta-Review Compilation The AC synthesizes all signals into a single meta-review
V Paper Decision The AC makes the final accept / reject call (fixed acceptance rate of 32 %)

📦 Installation

Requirement Version
Python 3.10+
LLM access OpenAI or Azure OpenAI API key
OS Linux / macOS / WSL
git clone https://github.com/Ahren09/AgentReview.git
cd AgentReview
pip install -r requirements.txt
🔑 Set environment variables — OpenAI
export OPENAI_API_KEY=sk-...
🔑 Set environment variables — Azure OpenAI
export AZURE_ENDPOINT=https://<your-endpoint>.openai.azure.com/
export AZURE_DEPLOYMENT=<your-deployment-name>
export AZURE_OPENAI_KEY=<your-key>

💾 Data setup

Two zip archives are hosted on Dropbox:

Archive Contents Target Required?
AgentReview_Paper_Data.zip PDFs of sampled ICLR papers + real ICLR 2020–2023 reviews data/
AgentReview_LLM_Reviews.zip The full LLM-generated review dataset from the paper outputs/ optional
unzip AgentReview_Paper_Data.zip   -d data/
unzip AgentReview_LLM_Reviews.zip  -d outputs/    # optional

🚀 Quick start

Run a full simulated review pass on ICLR 2024 with a malicious_Rx1 reviewer setting:

python run_paper_review_cli.py \
    --conference ICLR2024 \
    --openai_client_type azure_openai \
    --data_dir data \
    --experiment_name malicious_Rx1

Or explore interactively:

Note: all project files should be run from the AgentReview directory.


🛠️ Customizing your own setting

Define a new setting in agentreview/experiment_config.py and register it in all_settings:

all_settings = {
    "BASELINE":   baseline_setting,
    "benign_Rx1": benign_Rx1_setting,
    # ...
    "your_setting_name": your_setting,
}

📌 Notes

  • We use a fixed acceptance rate of 32 %, matching the actual ICLR 2020–2023 average. See Conference Acceptance Rates for context.
  • API providers can apply strict content filtering. You may need to relax filtering on your deployment to obtain complete generations.

📚 Citation

@inproceedings{jin2024agentreview,
  title     = {AgentReview: Exploring Peer Review Dynamics with LLM Agents},
  author    = {Jin, Yiqiao and Zhao, Qinlin and Wang, Yiyang and Chen, Hao
               and Zhu, Kaijie and Xiao, Yijia and Wang, Jindong},
  booktitle = {Proceedings of the 2024 Conference on Empirical Methods in
               Natural Language Processing (EMNLP)},
  year      = {2024}
}

🤝 Acknowledgments

The implementation builds on the chatarena multi-agent framework, and uses the OpenReview API to retrieve real ICLR submission data.


⚖️ License

Released under the Apache License 2.0.

About

Official Implementation for EMNLP 2024 (Main Track, Oral) "AgentReview: Exploring Academic Peer Review with LLM Agent."

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors