mlquantify is a Python library for quantification, also known as supervised prevalence estimation, designed to estimate the distribution of classes within datasets. It offers a range of tools for various quantification methods, model selection tailored for quantification tasks, evaluation metrics, and protocols to assess quantification performance. Additionally, mlquantify includes calibration tools, confidence region estimation, pluggable solvers and representations, and visualization utilities to help analyze and interpret results.
Website: https://luizfernandolj.github.io/mlquantify/
To install mlquantify, run the following command:
pip install mlquantifyIf you only want to update, run the code below:
pip install --upgrade mlquantify| Section | Description |
|---|---|
| 33 Quantification Methods | Counting (CC, PCC, ACC, TAC, TX, TMAX, T50, MS, MS2, FM, GACC, GPACC), Matching (DyS, HDy, HDx, SORD, SMM, MMD_RKHS, KDEyML, KDEyHD, KDEyCS, GHDy, GHDx, GKDEyML, EDy, EDx), Likelihood (EMQ, CDE, MLPE), Neighbors (PWK), Meta (EnsembleQ, AggregativeBootstrap, QuaDapt). |
| Dynamic class management | All methods are dynamic, and handle multiclass and binary problems; in the binary case, One-Vs-All (OVA) is applied automatically. |
| Solvers | Modular optimization backends: BinarySolver, LeastSquaresSolver, SimplexSolver. |
| Representations | Pluggable feature representations: HistogramRepresentation, KDERepresentation, DistanceRepresentation, KernelMeanRepresentation, PredictionRepresentation. |
| Losses | Composable loss functions (distance-based and likelihood-based) shared across quantifier families. |
| Calibration | ClassifierCalibrator and QuantifierCalibrator for post-hoc calibration of classifiers and quantifiers. |
| Confidence Regions | ConfidenceInterval, ConfidenceEllipseSimplex, ConfidenceEllipseCLR for uncertainty estimation on prevalence predictions. |
| Model Selection | GridSearchQ and evaluation protocols (APP, NPP, UPP, PPP) tailored for quantification tasks. |
| Evaluation Metrics | Metrics for quantification performance: AE, MAE, NAE, SE, MSE, KLD, RAE, NRAE, NKLD, NMD, RNOD, VSE, CvM_L1. |
| Visualization | scikit-learn-style Display classes for both single- and multiple-sample results: DiagonalDisplay, BiasDisplay, ErrorByShiftDisplay, PrevalenceDisplay, ConfidenceRegionDisplay. |
| Comprehensive Documentation | Full API reference and user guide covering all modules and methods. |
This snippet builds a synthetic 3-class dataset that combines prior and covariate shift with make_quantification, then evaluates a quantifier across the Artificial Prevalence Protocol (APP) with apply_protocol — which trains the model on a held-out split and scores it on many test samples drawn at controlled prevalences.
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from mlquantify.datasets import make_quantification
from mlquantify.model_selection import apply_protocol
from mlquantify.counting import ACC
# Synthetic 3-class data with a complex (stacked prior + covariate) shift
Xs, ys, _ = make_quantification(
n_classes=5,
shift_type=["prior", "covariate"], # compose two kinds of dataset shift
covariate_scale=1.0,
random_state=0,
)
X, y = np.concatenate(Xs), np.concatenate(ys) # pool the bags into one dataset
# Evaluate ACC under the Artificial Prevalence Protocol. apply_protocol fits the
# quantifier on a train split and scores it on many held-out test samples drawn
# at controlled prevalences (the test instances never overlap with training).
results = apply_protocol(
ACC(RandomForestClassifier(random_state=0)), X, y,
protocol="app", batch_size=[100, 500], scoring=["mae", "rae"], random_state=0, verbose=1
)
print(f"APP over {results['n_batches']} test samples")
print(f" mean MAE -> {results['MAE'].mean():.4f}")
print(f" mean RAE -> {results['RAE'].mean():.4f}")- In case you need any help, refer to the User Guide.
- Explore the API documentation for detailed developer information.
- See also the library in the pypi site in pypi mlquantify
- Check the CHANGELOG to see what's currently beign developed!
Core dependencies (installed automatically with pip install mlquantify):
- scikit-learn
- numpy
- scipy
- pandas
- joblib
- tqdm
Optional extras:
pip install mlquantify[viz]— plotting viamlquantify.visualization(adds matplotlib)pip install mlquantify[neural]— neural quantifiers such as QuaNet (adds PyTorch)pip install mlquantify[all]— everything above