Skip to content

AI-SDC/SACRO-ML

Repository files navigation

SACRO-ML: Disclosure Control Tools for ML Models

arXiv PyPI package Conda Python versions codecov Ask DeepWiki

An increasing body of work has shown that machine learning (ML) models may expose confidential properties of the data on which they are trained. This has resulted in a wide range of proposed attack methods with varying assumptions that exploit the model structure and/or behaviour to infer sensitive information.

The sacroml package is a collection of tools and resources for managing the statistical disclosure control (SDC) of trained ML models. In particular, it provides:

  • A safemodel package that extends commonly used ML models to provide ante-hoc SDC by assessing the theoretical risk posed by the training regime (such as hyperparameter, dataset, and architecture combinations) before (potentially) costly model fitting is performed. In addition, it ensures that best practice is followed with respect to privacy, e.g., using differential privacy optimisers where available. For large models and datasets, ante-hoc analysis has the potential for significant time and cost savings by helping to avoid wasting resources training models that are likely to be found to be disclosive after running intensive post-hoc analysis.
  • An attacks package that provides post-hoc SDC by assessing the empirical disclosure risk of a classification model through a variety of simulated attacks after training. It provides an integrated suite of attacks with a common application programming interface (API) and is designed to support the inclusion of additional state-of-the-art attacks as they become available. In addition to membership inference attacks (MIA) such as the likelihood ratio attack (LiRA), quantile regression MIA (QMIA), and attribute inference, the package provides novel structural attacks that report cheap-to-compute metrics, which can serve as indicators of model disclosiveness after model fitting, but before needing to run more computationally expensive MIAs.
  • Summaries of the results are written in a simple human-readable report.

Classification models from scikit-learn (including those implementing sklearn.base.BaseEstimator) and PyTorch are broadly supported within the package. Some attacks can still be run if only CSV files of the model predicted probabilities are supplied, e.g., if the model was produced in another language. See the examples for further information.

Installation

Python Package Index

$ pip install sacroml

Note: macOS users may need to install libomp due to a dependency on XGBoost:

$ brew install libomp

Conda

$ conda install sacroml

Usage

Quick-start example:

from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

from sacroml.attacks.likelihood_attack import LIRAAttack
from sacroml.attacks.target import Target

# Load dataset
X, y = load_breast_cancer(return_X_y=True, as_frame=False)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Fit model
model = RandomForestClassifier(min_samples_split=2, min_samples_leaf=1)
model.fit(X_train, y_train)

# Wrap model and data
target = Target(
    model=model,
    dataset_name="breast cancer",
    X_train=X_train,
    y_train=y_train,
    X_test=X_test,
    y_test=y_test,
)

# Create an attack object and run the attack
attack = LIRAAttack(n_shadow_models=100, output_dir="output_example")
attack.attack(target)

For more information, see the examples.

QMIA: Quantile Regression Membership Inference Attack

QMIA implements the attack from Bertran et al. (NeurIPS 2023). It trains a histogram-based quantile regressor (HistGradientBoostingRegressor) on non-member data to learn per-sample membership thresholds — no shadow models required.

from sacroml.attacks.qmia_attack import QMIAAttack
from sacroml.attacks.target import Target

target = Target(model=model, X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test)
attack = QMIAAttack(alpha=0.01, output_dir="output_qmia")
attack.attack(target)

Key features:

  • Multiclass support via the full hinge score: logit(p_y) - max_{y'!=y} logit(p_{y'})
  • Q conditioned on (x, y) — the regressor learns thresholds per sample and label
  • FPR control — the quantile level (1 - alpha) calibrates the false-positive rate on non-members

Benchmarking

Run the full benchmark comparing QMIA against WorstCase and LiRA:

python examples/sklearn/benchmark_qmia_full.py

MetaAttack: Unified Per-Record Vulnerability Aggregation

MetaAttack runs multiple privacy attacks (LiRA, QMIA, Structural) on the same target and aggregates their per-record results into a single vulnerability DataFrame. Three operating modes are supported via the behaviour parameter:

  • 'run_all' (default) — run every specified attack from scratch.
  • 'use_existing_only' — read per-record scores from pre-existing report.json files without re-running anything. Useful when expensive attacks such as LiRA have already been run.
  • 'fill_missing' — load existing results and run only the attacks not yet present.
from sacroml.attacks.meta_attack import MetaAttack
from sacroml.attacks.target import Target

target = Target(model=model, X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test)
meta = MetaAttack(
    attacks=[("lira", {}), ("qmia", {}), ("structural", {})],
    behaviour="run_all",  # alternatives: "use_existing_only", "fill_missing"
    output_dir="output_meta",
)
meta.attack(target)

The vulnerability matrix is saved as vulnerability_matrix.csv in output_dir.

Documentation

See API documentation.

Contributing

See our contributing guide.

Acknowledgement

This work was supported by UK Research and Innovation as part of the Data and Analytics Research Environments UK (DARE UK) programme, delivered in partnership with Health Data Research UK (HDR UK) and Administrative Data Research UK (ADR UK). The specific projects were Semi-Automated Checking of Research Outputs (SACRO; MC_PC_23006), Guidelines and Resources for AI Model Access from TrusTEd Research environments (GRAIMATTER; MC_PC_21033), and TREvolution (MC_PC_24038). This project has also been supported by MRC and EPSRC (PICTURES; MR/S010351/1).

Packages

 
 
 

Contributors

Languages