Aayush Dhakal* Subash Khanal Srikumar Sastry Jacob Arndt Philipe Ambrozio Dias Dalton Lunga Nathan Jacobs
This repo contains the code for the CVPR 2026 paper SimLBR. We introduce a new regularization objective, Latent Blending Regularization (LBR), for generalizable AI-generated Image Detection.
A frozen DINOv3 backbone extracts one CLS embedding per image, and a lightweight MLP head learns the real/fake classifier. When `--lbr` flag is enabled, real-image tokens are shifted assymetrically towards fake-image tokens during training, creating pseudo-fake samples near the real distribution. Validation and test always use unmodified images.
DINOv3 loading expects DINO_V3_KEY, the API key for dinov3 model:
export DINO_V3_KEY=API_KEY_FOR_DINO_V3_L16Run commands from the repository root:
cd ./SimLBRAIGC training uses ProGAN:
AIGCDetectionBenchMark/
train/ProGAN/<category>/0_real/*.png
train/ProGAN/<category>/1_fake/*.png
test/<model>/0_real/*
test/<model>/1_fake/*
GenImage training uses stable_diffusion_v_1_4:
GenImage/
<model>/<category>/train/nature/*.JPEG
<model>/<category>/train/ai/*.png
<model>/<category>/val/nature/*.JPEG
<model>/<category>/val/ai/*.png
Fake samples are labeled 1; real samples are labeled 0. During training, each fake anchor is paired with a random real image from the same dataset.
Baseline detector without latent blending:
python -m simlbr.train \
--dataset_name aigc \
--data_dir /projects/bdec/adhakal2/data/fake_data/AIGC/AIGCDetectionBenchMark \
--train_model ProGAN \
--val_model combined \
--ds_fraction 0.05 \
--batch_size 200 \
--num_workers 20 \
--max_epochs 5 \
--devices 4 \
--run_name aigc_cls_baselineSimLBR training with CLS-token latent blending:
python -m simlbr.train \
--dataset_name aigc \
--data_dir ../data/fake_data/AIGC/AIGCDetectionBenchMark \
--train_model ProGAN \
--val_model combined \
--ds_fraction 0.2 \
--batch_size 200 \
--num_workers 20 \
--max_epochs 5 \
--devices 4 \
--lbr \
--lbrdist 0.5 0.8 \
--run_name aigc_simlbrFast development check:
python -m simlbr.train --fast_dev_run --wandb_mode disabled --accelerator cpu --devices 1Evaluate all subsets for a dataset:
python -m simlbr.evaluate \
--dataset_name aigc \
--data_dir ../data/fake_data/AIGC/AIGCDetectionBenchMark \
--ckpt_path /path/to/checkpoint.ckpt \
--devices 1Evaluate selected subsets:
python -m simlbr.evaluate \
--dataset_name aigc \
--data_dir ../data/fake_data/AIGC/AIGCDetectionBenchMark \
--ckpt_path /path/to/checkpoint.ckpt \
--eval_datasets DALLE2 MidjourneyEvaluation writes evaluation_results.csv next to the checkpoint run directory. I --eval_datasets is not passed, this script launches evaluation across all generative models in the given dataset.
--lbr: enable Latent Blending Regularization (LBR) during training.--lbrdist LOW HIGH: alpha range for latent blending; default is0.5 0.8.--hidden_layers: number of MLP hidden layers after the DINOv3 CLS token.--activation:reluorgelu.--dropout: dropout inside the classifier MLP.--wandb_mode: useonline,offline, ordisabled.
@article{dhakal2026simlbr,
title={SimLBR: Learning to Detect Fake Images by Learning to Detect Real Images},
author={Dhakal, Aayush and Khanal, Subash and Sastry, Srikumar and Arndt, Jacob and Ambrozio Dias, Philipe and Lunga, Dalton and Jacobs, Nathan},
booktitle={Computer Vision and Pattern Recognition},
year={2026},
organization={IEEE/CVF}
}