A comprehensive, modular, and configurable framework for evaluating Machine Learning-based Intrusion Detection Systems (IDS).
- IDS Evaluation Framework
- Additional Information
- Modular Plugin Architecture: Easily extend the framework with custom IDS models, metrics, and adversarial attacks
- Flexible Data Pipeline: Load, preprocess, and split datasets with configurable preprocessing steps and feature selection
- Multiple Evaluation Modes: Support for intra-dataset, cross-dataset, and k-fold cross-validation evaluation
- Comprehensive Metrics: Built-in static metrics (accuracy, F1, precision, recall, ROC-AUC, etc.) and runtime metrics (CPU, RAM, training time)
- Adversarial Robustness Testing: Evaluate model robustness against adversarial attacks (FGSM, noise perturbation, junk data injection)
- Reproducible Results: Hash-based output organization ensures consistent experiment tracking
- Flexible Deployment: Run natively with Python or via Docker
- Python 3.13+
- uv (recommended) or pip
pip3 install ids-evaluation-framework
# Install dependencies (uv should be in your $PATH)
uv sync
# Verify installation
uv run ids-eval version- Official Docker Images (stable releases): https://hub.docker.com/r/niklassandhu/ids-eval-framework
- Currently supported architectures for docker images are: arm64 (Raspberry Pi, Apple Silicon, ...), amd64 (AMD, Intel)
# Configure environment variables
cp .env.example .env
# Edit .env to set your data paths
# Run via Docker Compose
docker compose run --rm ids-eval versionA pre-built Docker image is available on Docker Hub: niklassandhu/ids-eval-framework:latest
Copy the example configuration and adjust it to your needs:
cp examples/run_config/example.config.yml examples/run_config/my_config.ymlRun the data preparation pipeline:
uv run ids-eval dataset <run_config>Execute the evaluation pipeline:
uv run ids-eval evaluate <run_config>The framework provides two main commands:
| Command | Description |
|---|---|
ids-eval dataset <config.yml> |
Run dataset pipeline |
ids-eval evaluate <config.yml> |
Run evaluation pipeline |
| Flag | Description |
|---|---|
--train-only |
Only train models, skip testing phase |
--force-train |
Force retraining, ignore saved models |
--force-model |
Load saved models without config hash validation |
--clear-checkpoints |
Clear evaluation checkpoints before running |
make dataset CONFIG=<config.yml> # Run dataset pipeline
make evaluate CONFIG=<config.yml> # Run evaluation pipeline
make docker-dataset CONFIG=<config.yml> # Run dataset pipeline via Docker
make docker-evaluate CONFIG=<config.yml> # Run evaluation via Docker
make help # Show all available targetsThe framework uses YAML configuration files. See run_config/example.config.yml for a fully documented example.
- general: Run name, paths, random seed
- data_manager: Dataset loading, preprocessing, feature selection, train/test split
- evaluation: IDS models, metrics, adversarial attacks
All outputs are organized in hash-based directories for reproducibility:
out/
├── processed_datasets/<hash>/ # Preprocessed datasets
├── saved_models/<hash>/ # Trained models
└── reports/<hash>/ # Evaluation reports
├── config.yaml # Configuration used
├── dataset_report.yaml # Dataset statistics
├── ids_report.yaml # Detailed evaluation results
└── evaluation_summary.yaml # Aggregated summary
The configuration hash is displayed at startup:
Your config hash is: a1b2c3d4
The framework supports four types of plugins:
| Plugin Type | Directory | Base Class |
|---|---|---|
| IDS Models | plugin_ids/ |
AbstractIDSConnector |
| Static Metrics | plugin_static_metric/ |
AbstractStaticMetric |
| Runtime Metrics | plugin_runtime_metric/ |
AbstractRuntimeMetric |
| Adversarial Attacks | plugin_adversarial/ |
AbstractAdversarialAttack |
See the existing plugins in each directory for implementation examples.
make setup # Install dependencies
make test # Run tests
make lint # Check code style
make format # Format codePlease cite this project using the following bibtex entry:
@inproceedings{}If you find any bugs, bad patterns, performance issues, etc. do not hesitate to open an issue.
Any new features which should be part of the evaluation has to be underlined by peer-reviewed publications. This counts for new examples as well. All examples are reproduced publications except baseline models.
See LICENSE for details.
