Open-source vulnerability intelligence platform for exploit prediction, explainability, semantic search, and cybersecurity analytics.
AI-Powered Vulnerability Intelligence β’ Explainable AI β’ Semantic Search β’ Threat Prioritization
- π Analyzed 357,059 CVEs
- π€ XGBoost-based Exploit Prediction Model
- π§ Explainable AI using SHAP
- π FAISS Semantic Similarity Search
- π Interactive Streamlit Dashboard
- π― ROC-AUC: 0.9901
- π PR-AUC: 0.8708
- π‘οΈ Integrated NVD, CISA KEV, and EPSS Intelligence
ExploitOracle is an AI-powered Vulnerability Intelligence Platform designed to predict the likelihood of real-world exploitation of software vulnerabilities (CVEs).
The platform combines cybersecurity intelligence feeds, semantic embeddings, machine learning, explainable AI, and similarity search to provide actionable vulnerability prioritization.
Unlike traditional vulnerability management solutions that rely solely on CVSS severity scores, ExploitOracle predicts the probability that a vulnerability will actually be exploited in the wild.
ExploitOracle/
βββ app.py
βββ requirements.txt
βββ README.md
βββ DataCollection.ipynb
βββ Dataset and Model training.ipynb
βββ ExploitOracle_Final.ipynb
βββ model_v2.pkl
βββ shap_explainer.pkl
βββ faiss.index
βββ X_part1.npy
βββ X_part2.npy
βββ full_embeddings.npy
βββ full_embeddings.dat
βββ full_df_v5.parquet
βββ reports/
β βββ DASHBOARD.png
β βββ SHAP_FORCE_PLOT.png
β βββ SHAP_VALUE.png
β βββ CVE_TOP_20_REPORT.png
β βββ architecture.png
βββ LICENSE
| Dashboard | SHAP Explainability |
|---|---|
![]() |
![]() |
| SHAP Force Plot | Intelligence Report |
|---|---|
![]() |
![]() |
Organizations face thousands of vulnerabilities every year.
A major challenge is determining:
Which vulnerabilities should be patched first?
Traditional severity-based prioritization often fails because:
- High CVSS vulnerabilities may never be exploited.
- Low CVSS vulnerabilities may become actively exploited.
- Security teams have limited remediation resources.
- Analysts require explainable and evidence-backed prioritization.
ExploitOracle addresses these challenges using AI-driven exploit prediction and vulnerability intelligence.
| Category | Technology |
|---|---|
| Language | Python |
| Machine Learning | XGBoost |
| Embeddings | BGE Base |
| Explainability | SHAP |
| Similarity Search | FAISS |
| Dashboard | Streamlit |
| Visualization | Plotly |
| Network Graphs | NetworkX |
| Reporting | ReportLab |
| Data Sources | NVD, KEV, EPSS |
- CVE Intelligence Dashboard
- Exploit Probability Prediction
- Threat Score Calculation
- Executive Security Summary
- Patch Priority Recommendations
- Known Exploited Vulnerability Detection
- SHAP Explainability
- SHAP Force Plots
- Feature Attribution Analysis
- Prediction Transparency
- IOC Recommendations
- MITRE ATT&CK Mapping
- Threat Hunting Interface
- Vulnerability Prioritization
- BGE Embeddings
- FAISS Similarity Search
- Similarity Network Graph
- Related Vulnerability Discovery
- PDF Intelligence Reports
- Risk Rankings
- Executive Summaries
- Security Analytics
Contains:
- CVE IDs
- Descriptions
- CVSS Metrics
- Attack Vector
- Attack Complexity
- Privileges Required
- CWE Information
Source: https://nvd.nist.gov/
Contains vulnerabilities confirmed to be exploited in the wild.
Source: https://www.cisa.gov/known-exploited-vulnerabilities-catalog
Exploit Prediction Scoring System
Source: https://www.first.org/epss/
| Metric | Value |
|---|---|
| Total CVEs | 357,059 |
| Known Exploited CVEs | 1,617 |
| Non-Exploited CVEs | 355,442 |
| Embedding Model | BAAI/bge-base-en-v1.5 |
| Embedding Size | 768 |
| Final Feature Dimension | 775 |
Class Distribution:
Label 0 (Non Exploited): 355,442
Label 1 (Exploited): 1,617
- CVSS Score
- EPSS Score
- EPSS Percentile
- Attack Vector
- Attack Complexity
- Privileges Required
- CWE
Vulnerability descriptions were converted into semantic vectors using:
BAAI/bge-base-en-v1.5
Embedding Dimension:
768
ExploitOracle combines multiple vulnerability intelligence feeds, semantic embeddings, machine learning prediction models, explainable AI, and similarity search into a unified vulnerability intelligence platform.
CVE Description
β
βΌ
BGE Embedding Model
β
βΌ
768-Dimensional Vector
β
βΌ
Feature Concatenation
β
βΌ
XGBoost Classifier
β
βΌ
Exploit Probability
Precision (Exploited): 0.77
Recall (Exploited): 0.79
F1-Score: 0.78
ROC-AUC: 0.9888
PR-AUC : 0.8610
Precision (Exploited): 0.77
Recall (Exploited): 0.78
F1-Score: 0.78
ROC-AUC : 0.9901
PR-AUC : 0.8708
| Metric | Value |
|---|---|
| Accuracy | 98% |
| ROC-AUC | 0.9901 |
| PR-AUC | 0.8708 |
| Precision (Exploited) | 0.77 |
| Recall (Exploited) | 0.78 |
| F1 Score (Exploited) | 0.78 |
After model training and validation, exploit probabilities were generated for all vulnerabilities.
Example Predictions:
CVE-1999-0095 -> 0.000619
CVE-1999-0082 -> 0.000082
CVE-1999-1471 -> 0.000039
CVE-1999-1122 -> 0.000058
CVE-1999-1467 -> 0.000058
Total Predictions Generated:
357,059 CVEs
Stored as:
full_df_v5.parquet
ExploitOracle integrates SHAP for model transparency.
FAISS was used to build a semantic similarity engine.
Indexed Vulnerabilities:
357,059 CVEs
Capabilities:
- Similar Vulnerability Discovery
- Threat Correlation
- Vulnerability Clustering
Features:
- Dashboard Home
- CVE Intelligence
- Threat Score
- Patch Priority
- Executive Summary
- Threat Hunting
- Statistics Dashboard
- Similarity Graph
- SHAP Explainability
- PDF Reporting
| Component | Value |
|---|---|
| Total CVEs Processed | 357,059 |
| Known Exploited CVEs | 1,617 |
| Embedding Dimension | 768 |
| Final Feature Dimension | 775 |
| Model | XGBoost |
| Similarity Engine | FAISS |
| Explainability | SHAP |
| Dashboard Framework | Streamlit |
precision recall f1-score support
0 0.99 0.99 0.99 6401
1 0.77 0.79 0.78 323
accuracy 0.98 6724
macro avg 0.88 0.89 0.88 6724
weighted avg 0.98 0.98 0.98 6724
ROC-AUC: 0.9888
PR-AUC : 0.8610
precision recall f1-score support
0 0.99 0.99 0.99 6401
1 0.77 0.78 0.78 323
accuracy 0.98 6724
macro avg 0.88 0.88 0.88 6724
weighted avg 0.98 0.98 0.98 6724
ROC-AUC : 0.9901
PR-AUC : 0.8708
ExploitOracle demonstrates how Machine Learning, Explainable AI, Semantic Search, and Cybersecurity Intelligence can be combined to prioritize vulnerabilities based on real-world exploitation likelihood.
The platform processes over 357,000 CVEs and achieves:
- ROC-AUC: 0.9901
- PR-AUC: 0.8708
- Accuracy: 98%
while providing explainable, actionable, and analyst-friendly vulnerability intelligence through an interactive security operations dashboard.
Vellore Institute of Technology, VIT-AP University
National Institute of Technology Calicut
As a developer, cybersecurity enthusiast, and security engineer, I actively support open-source initiatives and collaborative research.
I am open to contributing to cybersecurity, AI/ML, threat intelligence, software engineering, and security research projects. If you are building something interesting and would like collaboration, contributions, technical discussions, or research support, feel free to reach out.
π§ [email protected]
This project is released under the MIT License. See the LICENSE file for the complete MIT License text.




