Skip to content

mcnamacl/Vacuity_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vacuity Analysis

Code and data for reproducing the post-hoc analysis of vacuity-based out-of-distribution (OOD) detection in Evidential Deep Learning (EDL), as described in the accompanying paper: Rethinking Vacuity for OOD Detection in Evidential Deep Learning.

Overview

This repository investigates the sensitvity of vacuity as Uncertainty Mass (UM) to K-variance on multiple-choice QA benchmarks.

The key question: does vacuity-based OOD detection succeed because the model is genuinely more uncertain on OOD data, or because vacuity is an artefact of class cardinality K?

Vacuity is defined as:

vacuity = K / S,   where S = sum(alpha_i) = sum(e_i + 1)

Because K appears directly in the numerator, datasets with different numbers of answer choices will produce systematically different vacuity distributions, independent of model uncertainty. The synthetic experiment in synthetic_auroc_analysis.py isolates this effect.

Repository structure

.
├── train_ib_edl.py                       # Fine-tune Llama-3-8B with IB-EDL loss
├── train_standard_edl.py                 # Fine-tune Llama-3-8B with Standard EDL loss
├── inference_ib_edl.py                   # Generate predictions with IB-EDL checkpoint
├── inference_standard_edl.py             # Generate predictions with Standard EDL checkpoint
│
├── auroc_npz_uncertainty.py              # AUROC via uncertainty mass & max probability (NPZ inputs)
├── auroc_entropy_analysis.py             # AUROC via Shannon entropy, uncertainty, and max probability (JSON inputs)
├── synthetic_auroc_analysis.py           # Synthetic K-expansion experiment (main analysis)
├── synthetic_vacuity_results.csv         # Output results table from the synthetic experiment
├── synthetic_auroc.png                   # AUROC vs K plot (generated output)
├── synthetic_aupr.png                    # AUPR vs K plot (generated output)
│
├── Implementation_A_ib-edl_sigma_mult_0/ # IB-EDL outputs from the authors' original code (sigma_mult=0)
├── Implementation_A_standard_edl/        # Standard EDL outputs from the authors' original code
├── Implementation_B_ib-edl_sigma_mult_0/ # IB-EDL outputs from our reimplementation (sigma_mult=0)
└── Implementation_B_standard_edl/        # Standard EDL outputs from our reimplementation

Datasets. All benchmarks are publicly available. Prediction files use the following abbreviations:

Abbreviation Dataset Role
obqa OpenBookQA (4 classes) ID
arc_c ARC-Challenge (4 classes) OOD
arc_e ARC-Easy (4 classes) OOD
csqa CommonsenseQA (4 or 5 classes) OOD
mmlu_math MMLU — Abstract Algebra (4 classes) OOD

Requirements

Analysis scripts:

pip install numpy scipy scikit-learn pandas matplotlib

Training and inference scripts:

pip install torch transformers peft datasets bitsandbytes accelerate wandb

A HuggingFace account with access to meta-llama/Meta-Llama-3-8B is required. Set your token before running:

export HF_TOKEN=<your_token>

Fine-tuning

Both training scripts fine-tune meta-llama/Meta-Llama-3-8B on OpenBookQA using LoRA (target modules: q_proj, v_proj, lm_head; r=8). Training runs for 10,080 steps with batch size 4 and learning rate 5e-5.

Set the output directory via the OUTPUT_DIR environment variable (default: ./out-ib-edl or ./out-standard-edl).

# IB-EDL
python train_ib_edl.py

# Standard EDL
python train_standard_edl.py

WandB logging is enabled by default. Set WANDB_API_KEY in your environment or run wandb login beforehand.

Inference

Both inference scripts load a trained checkpoint and generate prediction JSON files for all five evaluation datasets (OBQA, ARC-C, ARC-E, MMLU-Math, CSQA).

Set paths via environment variables:

Variable Description Default
PEFT_PATH Path to the trained LoRA checkpoint ./out-*/checkpoint-10080
OUTPUT_DIR Directory to write JSON result files .
# IB-EDL
PEFT_PATH=./out-ib-edl/checkpoint-10080 python inference_ib_edl.py

# Standard EDL
PEFT_PATH=./out-standard-edl/checkpoint-10080 python inference_standard_edl.py

Each script writes one JSON file per dataset (e.g. ib_edl_obqa_results.json). CSQA produces two files: one with K=4 and one with K=5.

Reproducing the analysis

Pre-generated prediction files are included in the Implementation_* directories, so the analysis scripts can be run without re-training.

1. Synthetic K-expansion experiment (primary result)

python synthetic_auroc_analysis.py

Outputs: synthetic_vacuity_results.csv, synthetic_auroc.png, synthetic_aupr.png, and a printed interpretation summary.

2. AUROC via Shannon entropy / uncertainty / max probability (JSON)

python auroc_entropy_analysis.py

Prints per-dataset AUROC and AUPR to stdout. Uncomment the plt.savefig lines to save ROC plots.

3. AUROC via uncertainty mass from NPZ files (authors' original outputs)

python auroc_npz_uncertainty.py

Saves a three-panel AUROC figure to Implementation_A_ib-edl_sigma_mult_0/.

Key findings

  • Baseline (K=4 for both ID and OOD): AUROC ≈ 0.57 — near-random discrimination.
  • OOD-only K expansion (ID stays at K=4; OOD inflated to K=5–8): AUROC rises to ≈ 0.92 at K=8, driven entirely by the K term in the vacuity formula.
  • Matched expansion (both distributions expanded equally): AUROC remains ≈ 0.57 at all K values, confirming the effect is purely a function of relative class counts rather than genuine uncertainty.

Citation

If you use this code or data, please cite the accompanying paper:

@misc{mcnamara2026rethinkingvacuityooddetection,
      title={Rethinking Vacuity for OOD Detection in Evidential Deep Learning}, 
      author={Claire McNamara},
      year={2026},
      eprint={2605.06382},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2605.06382}, 
}

License

The benchmark questions included in the JSON files are drawn from publicly available datasets released under their respective licenses (see the dataset links above). The analysis code in this repository is released under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages