🛡️ ExploitOracle V5

Explainable Vulnerability Intelligence Platform

Open-source vulnerability intelligence platform for exploit prediction, explainability, semantic search, and cybersecurity analytics.

AI-Powered Vulnerability Intelligence • Explainable AI • Semantic Search • Threat Prioritization

⚡ Project Highlights

🔍 Analyzed 357,059 CVEs
🤖 XGBoost-based Exploit Prediction Model
🧠 Explainable AI using SHAP
🔗 FAISS Semantic Similarity Search
📊 Interactive Streamlit Dashboard
🎯 ROC-AUC: 0.9901
📈 PR-AUC: 0.8708
🛡️ Integrated NVD, CISA KEV, and EPSS Intelligence

📌 Overview

ExploitOracle is an AI-powered Vulnerability Intelligence Platform designed to predict the likelihood of real-world exploitation of software vulnerabilities (CVEs).

The platform combines cybersecurity intelligence feeds, semantic embeddings, machine learning, explainable AI, and similarity search to provide actionable vulnerability prioritization.

Unlike traditional vulnerability management solutions that rely solely on CVSS severity scores, ExploitOracle predicts the probability that a vulnerability will actually be exploited in the wild.

📂 Repository Structure

ExploitOracle/
├── app.py
├── requirements.txt
├── README.md

├── DataCollection.ipynb
├── Dataset and Model training.ipynb
├── ExploitOracle_Final.ipynb

├── model_v2.pkl
├── shap_explainer.pkl

├── faiss.index

├── X_part1.npy
├── X_part2.npy

├── full_embeddings.npy
├── full_embeddings.dat

├── full_df_v5.parquet

├── reports/
│   ├── DASHBOARD.png
│   ├── SHAP_FORCE_PLOT.png
│   ├── SHAP_VALUE.png
│   ├── CVE_TOP_20_REPORT.png
│   └── architecture.png

└── LICENSE

📸 Platform Preview

Dashboard	SHAP Explainability

SHAP Force Plot	Intelligence Report

🎯 Problem Statement

Organizations face thousands of vulnerabilities every year.

A major challenge is determining:

Which vulnerabilities should be patched first?

Traditional severity-based prioritization often fails because:

High CVSS vulnerabilities may never be exploited.
Low CVSS vulnerabilities may become actively exploited.
Security teams have limited remediation resources.
Analysts require explainable and evidence-backed prioritization.

ExploitOracle addresses these challenges using AI-driven exploit prediction and vulnerability intelligence.

🛠️ Technologies Used

Category	Technology
Language	Python
Machine Learning	XGBoost
Embeddings	BGE Base
Explainability	SHAP
Similarity Search	FAISS
Dashboard	Streamlit
Visualization	Plotly
Network Graphs	NetworkX
Reporting	ReportLab
Data Sources	NVD, KEV, EPSS

🚀 Key Features

Vulnerability Intelligence

CVE Intelligence Dashboard
Exploit Probability Prediction
Threat Score Calculation
Executive Security Summary
Patch Priority Recommendations
Known Exploited Vulnerability Detection

Explainable AI

SHAP Explainability
SHAP Force Plots
Feature Attribution Analysis
Prediction Transparency

Threat Intelligence

IOC Recommendations
MITRE ATT&CK Mapping
Threat Hunting Interface
Vulnerability Prioritization

Semantic Search

BGE Embeddings
FAISS Similarity Search
Similarity Network Graph
Related Vulnerability Discovery

Reporting

PDF Intelligence Reports
Risk Rankings
Executive Summaries
Security Analytics

📊 Dataset Sources

National Vulnerability Database (NVD)

Contains:

CVE IDs
Descriptions
CVSS Metrics
Attack Vector
Attack Complexity
Privileges Required
CWE Information

Source: https://nvd.nist.gov/

CISA Known Exploited Vulnerabilities (KEV)

Contains vulnerabilities confirmed to be exploited in the wild.

Source: https://www.cisa.gov/known-exploited-vulnerabilities-catalog

EPSS

Exploit Prediction Scoring System

Source: https://www.first.org/epss/

📈 Dataset Statistics

Metric	Value
Total CVEs	357,059
Known Exploited CVEs	1,617
Non-Exploited CVEs	355,442
Embedding Model	BAAI/bge-base-en-v1.5
Embedding Size	768
Final Feature Dimension	775

Class Distribution:

Label 0 (Non Exploited): 355,442
Label 1 (Exploited):      1,617

⚙️ Feature Engineering

Numerical Features

CVSS Score
EPSS Score
EPSS Percentile

Categorical Features

Attack Vector
Attack Complexity
Privileges Required
CWE

Semantic Features

Vulnerability descriptions were converted into semantic vectors using:

BAAI/bge-base-en-v1.5

Embedding Dimension:

🏗️ System Architecture

ExploitOracle combines multiple vulnerability intelligence feeds, semantic embeddings, machine learning prediction models, explainable AI, and similarity search into a unified vulnerability intelligence platform.

🤖 Machine Learning Pipeline

CVE Description
       │
       ▼
BGE Embedding Model
       │
       ▼
768-Dimensional Vector
       │
       ▼
Feature Concatenation
       │
       ▼
XGBoost Classifier
       │
       ▼
Exploit Probability

📊 Model Evaluation

Version 1 Results

Precision (Exploited): 0.77
Recall (Exploited):    0.79
F1-Score:              0.78

ROC-AUC: 0.9888
PR-AUC : 0.8610

Version 2 Results (Final Model)

Precision (Exploited): 0.77
Recall (Exploited):    0.78
F1-Score:              0.78

ROC-AUC : 0.9901
PR-AUC  : 0.8708

🏆 Final Performance

Metric	Value
Accuracy	98%
ROC-AUC	0.9901
PR-AUC	0.8708
Precision (Exploited)	0.77
Recall (Exploited)	0.78
F1 Score (Exploited)	0.78

🔬 Proof of Work

Full Dataset Inference

After model training and validation, exploit probabilities were generated for all vulnerabilities.

Example Predictions:

CVE-1999-0095 -> 0.000619
CVE-1999-0082 -> 0.000082
CVE-1999-1471 -> 0.000039
CVE-1999-1122 -> 0.000058
CVE-1999-1467 -> 0.000058

Total Predictions Generated:

357,059 CVEs

Stored as:

full_df_v5.parquet

🧠 Explainable AI

ExploitOracle integrates SHAP for model transparency.

SHAP Force Plot

SHAP Feature Importance

🌐 Similarity Search

FAISS was used to build a semantic similarity engine.

Indexed Vulnerabilities:

357,059 CVEs

Capabilities:

Similar Vulnerability Discovery
Threat Correlation
Vulnerability Clustering

📋 Dashboard

Main Dashboard

Features:

Dashboard Home
CVE Intelligence
Threat Score
Patch Priority
Executive Summary
Threat Hunting
Statistics Dashboard
Similarity Graph
SHAP Explainability
PDF Reporting

📄 Intelligence Report

📊 Project Scale

Component	Value
Total CVEs Processed	357,059
Known Exploited CVEs	1,617
Embedding Dimension	768
Final Feature Dimension	775
Model	XGBoost
Similarity Engine	FAISS
Explainability	SHAP
Dashboard Framework	Streamlit

📊 Model Evaluation

Version 1 Results

              precision    recall  f1-score   support

           0       0.99      0.99      0.99      6401
           1       0.77      0.79      0.78       323

    accuracy                           0.98      6724
   macro avg       0.88      0.89      0.88      6724
weighted avg       0.98      0.98      0.98      6724

ROC-AUC: 0.9888
PR-AUC : 0.8610

Version 2 Results (Final Model)

              precision    recall  f1-score   support

           0       0.99      0.99      0.99      6401
           1       0.77      0.78      0.78       323

    accuracy                           0.98      6724
   macro avg       0.88      0.88      0.88      6724
weighted avg       0.98      0.98      0.98      6724

ROC-AUC : 0.9901
PR-AUC  : 0.8708

⭐ Project Outcome

ExploitOracle demonstrates how Machine Learning, Explainable AI, Semantic Search, and Cybersecurity Intelligence can be combined to prioritize vulnerabilities based on real-world exploitation likelihood.

The platform processes over 357,000 CVEs and achieves:

ROC-AUC: 0.9901
PR-AUC: 0.8708
Accuracy: 98%

while providing explainable, actionable, and analyst-friendly vulnerability intelligence through an interactive security operations dashboard.

👥 Authors

K. Sai Abhiram

Vellore Institute of Technology, VIT-AP University

K. Madhu Hasini

National Institute of Technology Calicut

🌍 Open Source & Collaboration

As a developer, cybersecurity enthusiast, and security engineer, I actively support open-source initiatives and collaborative research.

I am open to contributing to cybersecurity, AI/ML, threat intelligence, software engineering, and security research projects. If you are building something interesting and would like collaboration, contributions, technical discussions, or research support, feel free to reach out.

📧 [email protected]

📜 License

This project is released under the MIT License. See the LICENSE file for the complete MIT License text.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
reports		reports
.gitattributes		.gitattributes
Architecture.png		Architecture.png
DataCollection.ipynb		DataCollection.ipynb
Dataset and Model training.ipynb		Dataset and Model training.ipynb
ExploitOracle_Final.ipynb		ExploitOracle_Final.ipynb
LICENSE		LICENSE
README.md		README.md
X_part1.npy		X_part1.npy
X_part2.npy		X_part2.npy
app.py		app.py
app_UI.py		app_UI.py
bge_embeddings.npy		bge_embeddings.npy
epss_fixed.csv		epss_fixed.csv
epss_raw.csv		epss_raw.csv
faiss.index		faiss.index
full_df.parquet		full_df.parquet
full_df_v4.parquet		full_df_v4.parquet
full_df_v5.parquet		full_df_v5.parquet
full_df_with_predictions.parquet		full_df_with_predictions.parquet
full_embeddings.dat		full_embeddings.dat
full_embeddings.npy		full_embeddings.npy
kev_raw.csv		kev_raw.csv
label_encoders.pkl		label_encoders.pkl
model_v1.pkl		model_v1.pkl
model_v2.pkl		model_v2.pkl
requirements.txt		requirements.txt
shap_explainer.pkl		shap_explainer.pkl
top100_risk.csv		top100_risk.csv
top100_risk_cves.csv		top100_risk_cves.csv
train_df.parquet		train_df.parquet
train_df_v2.parquet		train_df_v2.parquet
training_dataset.parquet		training_dataset.parquet
xgb_exploit_predictor.pkl		xgb_exploit_predictor.pkl
y_full.npy		y_full.npy

Folders and files

Latest commit

History

Repository files navigation

🛡️ ExploitOracle V5

Explainable Vulnerability Intelligence Platform

⚡ Project Highlights

📌 Overview

📂 Repository Structure

📸 Platform Preview

🎯 Problem Statement

🛠️ Technologies Used

🚀 Key Features

Vulnerability Intelligence

Explainable AI

Threat Intelligence

Semantic Search

Reporting

📊 Dataset Sources

National Vulnerability Database (NVD)

CISA Known Exploited Vulnerabilities (KEV)

EPSS

📈 Dataset Statistics

⚙️ Feature Engineering

Numerical Features

Categorical Features

Semantic Features

🏗️ System Architecture

🤖 Machine Learning Pipeline

📊 Model Evaluation

Version 1 Results

Version 2 Results (Final Model)

🏆 Final Performance

🔬 Proof of Work

Full Dataset Inference

🧠 Explainable AI

SHAP Force Plot

SHAP Feature Importance

🌐 Similarity Search

📋 Dashboard

Main Dashboard

📄 Intelligence Report

📊 Project Scale

📊 Model Evaluation

Version 1 Results

Version 2 Results (Final Model)

⭐ Project Outcome

👥 Authors

K. Sai Abhiram

K. Madhu Hasini

🌍 Open Source & Collaboration

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages