Skip to content

i-mAshura/ExploitOracle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›‘οΈ ExploitOracle V5

Explainable Vulnerability Intelligence Platform

License Python XGBoost FAISS SHAP

Open-source vulnerability intelligence platform for exploit prediction, explainability, semantic search, and cybersecurity analytics.

AI-Powered Vulnerability Intelligence β€’ Explainable AI β€’ Semantic Search β€’ Threat Prioritization


⚑ Project Highlights

  • πŸ” Analyzed 357,059 CVEs
  • πŸ€– XGBoost-based Exploit Prediction Model
  • 🧠 Explainable AI using SHAP
  • πŸ”— FAISS Semantic Similarity Search
  • πŸ“Š Interactive Streamlit Dashboard
  • 🎯 ROC-AUC: 0.9901
  • πŸ“ˆ PR-AUC: 0.8708
  • πŸ›‘οΈ Integrated NVD, CISA KEV, and EPSS Intelligence

πŸ“Œ Overview

ExploitOracle is an AI-powered Vulnerability Intelligence Platform designed to predict the likelihood of real-world exploitation of software vulnerabilities (CVEs).

The platform combines cybersecurity intelligence feeds, semantic embeddings, machine learning, explainable AI, and similarity search to provide actionable vulnerability prioritization.

Unlike traditional vulnerability management solutions that rely solely on CVSS severity scores, ExploitOracle predicts the probability that a vulnerability will actually be exploited in the wild.


πŸ“‚ Repository Structure

ExploitOracle/
β”œβ”€β”€ app.py
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md

β”œβ”€β”€ DataCollection.ipynb
β”œβ”€β”€ Dataset and Model training.ipynb
β”œβ”€β”€ ExploitOracle_Final.ipynb

β”œβ”€β”€ model_v2.pkl
β”œβ”€β”€ shap_explainer.pkl

β”œβ”€β”€ faiss.index

β”œβ”€β”€ X_part1.npy
β”œβ”€β”€ X_part2.npy

β”œβ”€β”€ full_embeddings.npy
β”œβ”€β”€ full_embeddings.dat

β”œβ”€β”€ full_df_v5.parquet

β”œβ”€β”€ reports/
β”‚   β”œβ”€β”€ DASHBOARD.png
β”‚   β”œβ”€β”€ SHAP_FORCE_PLOT.png
β”‚   β”œβ”€β”€ SHAP_VALUE.png
β”‚   β”œβ”€β”€ CVE_TOP_20_REPORT.png
β”‚   └── architecture.png

└── LICENSE

πŸ“Έ Platform Preview

Dashboard SHAP Explainability
SHAP Force Plot Intelligence Report

🎯 Problem Statement

Organizations face thousands of vulnerabilities every year.

A major challenge is determining:

Which vulnerabilities should be patched first?

Traditional severity-based prioritization often fails because:

  • High CVSS vulnerabilities may never be exploited.
  • Low CVSS vulnerabilities may become actively exploited.
  • Security teams have limited remediation resources.
  • Analysts require explainable and evidence-backed prioritization.

ExploitOracle addresses these challenges using AI-driven exploit prediction and vulnerability intelligence.


πŸ› οΈ Technologies Used

Category Technology
Language Python
Machine Learning XGBoost
Embeddings BGE Base
Explainability SHAP
Similarity Search FAISS
Dashboard Streamlit
Visualization Plotly
Network Graphs NetworkX
Reporting ReportLab
Data Sources NVD, KEV, EPSS

πŸš€ Key Features

Vulnerability Intelligence

  • CVE Intelligence Dashboard
  • Exploit Probability Prediction
  • Threat Score Calculation
  • Executive Security Summary
  • Patch Priority Recommendations
  • Known Exploited Vulnerability Detection

Explainable AI

  • SHAP Explainability
  • SHAP Force Plots
  • Feature Attribution Analysis
  • Prediction Transparency

Threat Intelligence

  • IOC Recommendations
  • MITRE ATT&CK Mapping
  • Threat Hunting Interface
  • Vulnerability Prioritization

Semantic Search

  • BGE Embeddings
  • FAISS Similarity Search
  • Similarity Network Graph
  • Related Vulnerability Discovery

Reporting

  • PDF Intelligence Reports
  • Risk Rankings
  • Executive Summaries
  • Security Analytics

πŸ“Š Dataset Sources

National Vulnerability Database (NVD)

Contains:

  • CVE IDs
  • Descriptions
  • CVSS Metrics
  • Attack Vector
  • Attack Complexity
  • Privileges Required
  • CWE Information

Source: https://nvd.nist.gov/


CISA Known Exploited Vulnerabilities (KEV)

Contains vulnerabilities confirmed to be exploited in the wild.

Source: https://www.cisa.gov/known-exploited-vulnerabilities-catalog


EPSS

Exploit Prediction Scoring System

Source: https://www.first.org/epss/


πŸ“ˆ Dataset Statistics

Metric Value
Total CVEs 357,059
Known Exploited CVEs 1,617
Non-Exploited CVEs 355,442
Embedding Model BAAI/bge-base-en-v1.5
Embedding Size 768
Final Feature Dimension 775

Class Distribution:

Label 0 (Non Exploited): 355,442
Label 1 (Exploited):      1,617

βš™οΈ Feature Engineering

Numerical Features

  • CVSS Score
  • EPSS Score
  • EPSS Percentile

Categorical Features

  • Attack Vector
  • Attack Complexity
  • Privileges Required
  • CWE

Semantic Features

Vulnerability descriptions were converted into semantic vectors using:

BAAI/bge-base-en-v1.5

Embedding Dimension:

768

πŸ—οΈ System Architecture

ExploitOracle combines multiple vulnerability intelligence feeds, semantic embeddings, machine learning prediction models, explainable AI, and similarity search into a unified vulnerability intelligence platform.


πŸ€– Machine Learning Pipeline

CVE Description
       β”‚
       β–Ό
BGE Embedding Model
       β”‚
       β–Ό
768-Dimensional Vector
       β”‚
       β–Ό
Feature Concatenation
       β”‚
       β–Ό
XGBoost Classifier
       β”‚
       β–Ό
Exploit Probability

πŸ“Š Model Evaluation

Version 1 Results

Precision (Exploited): 0.77
Recall (Exploited):    0.79
F1-Score:              0.78

ROC-AUC: 0.9888
PR-AUC : 0.8610

Version 2 Results (Final Model)

Precision (Exploited): 0.77
Recall (Exploited):    0.78
F1-Score:              0.78

ROC-AUC : 0.9901
PR-AUC  : 0.8708

πŸ† Final Performance

Metric Value
Accuracy 98%
ROC-AUC 0.9901
PR-AUC 0.8708
Precision (Exploited) 0.77
Recall (Exploited) 0.78
F1 Score (Exploited) 0.78

πŸ”¬ Proof of Work

Full Dataset Inference

After model training and validation, exploit probabilities were generated for all vulnerabilities.

Example Predictions:

CVE-1999-0095 -> 0.000619
CVE-1999-0082 -> 0.000082
CVE-1999-1471 -> 0.000039
CVE-1999-1122 -> 0.000058
CVE-1999-1467 -> 0.000058

Total Predictions Generated:

357,059 CVEs

Stored as:

full_df_v5.parquet

🧠 Explainable AI

ExploitOracle integrates SHAP for model transparency.

SHAP Force Plot

SHAP Feature Importance


🌐 Similarity Search

FAISS was used to build a semantic similarity engine.

Indexed Vulnerabilities:

357,059 CVEs

Capabilities:

  • Similar Vulnerability Discovery
  • Threat Correlation
  • Vulnerability Clustering

πŸ“‹ Dashboard

Main Dashboard

Features:

  • Dashboard Home
  • CVE Intelligence
  • Threat Score
  • Patch Priority
  • Executive Summary
  • Threat Hunting
  • Statistics Dashboard
  • Similarity Graph
  • SHAP Explainability
  • PDF Reporting

πŸ“„ Intelligence Report


πŸ“Š Project Scale

Component Value
Total CVEs Processed 357,059
Known Exploited CVEs 1,617
Embedding Dimension 768
Final Feature Dimension 775
Model XGBoost
Similarity Engine FAISS
Explainability SHAP
Dashboard Framework Streamlit

πŸ“Š Model Evaluation

Version 1 Results

              precision    recall  f1-score   support

           0       0.99      0.99      0.99      6401
           1       0.77      0.79      0.78       323

    accuracy                           0.98      6724
   macro avg       0.88      0.89      0.88      6724
weighted avg       0.98      0.98      0.98      6724

ROC-AUC: 0.9888
PR-AUC : 0.8610

Version 2 Results (Final Model)

              precision    recall  f1-score   support

           0       0.99      0.99      0.99      6401
           1       0.77      0.78      0.78       323

    accuracy                           0.98      6724
   macro avg       0.88      0.88      0.88      6724
weighted avg       0.98      0.98      0.98      6724

ROC-AUC : 0.9901
PR-AUC  : 0.8708

⭐ Project Outcome

ExploitOracle demonstrates how Machine Learning, Explainable AI, Semantic Search, and Cybersecurity Intelligence can be combined to prioritize vulnerabilities based on real-world exploitation likelihood.

The platform processes over 357,000 CVEs and achieves:

  • ROC-AUC: 0.9901
  • PR-AUC: 0.8708
  • Accuracy: 98%

while providing explainable, actionable, and analyst-friendly vulnerability intelligence through an interactive security operations dashboard.


πŸ‘₯ Authors

K. Sai Abhiram

Vellore Institute of Technology, VIT-AP University

K. Madhu Hasini

National Institute of Technology Calicut


🌍 Open Source & Collaboration

As a developer, cybersecurity enthusiast, and security engineer, I actively support open-source initiatives and collaborative research.

I am open to contributing to cybersecurity, AI/ML, threat intelligence, software engineering, and security research projects. If you are building something interesting and would like collaboration, contributions, technical discussions, or research support, feel free to reach out.

πŸ“§ [email protected]


πŸ“œ License

This project is released under the MIT License. See the LICENSE file for the complete MIT License text.

About

Built an AI-powered vulnerability intelligence platform using NVD, CISA KEV, EPSS, BGE embeddings, XGBoost, FAISS, SHAP, and Streamlit to predict exploit likelihood for 357K+ CVEs, achieving ROC-AUC 0.99 and PR-AUC 0.87.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors