edgeAI-document-detection

Real-Time Edge AI computer vision system for document detection and privacy risk mitigation in healthcare-oriented environments.

Executive Summary

This repository contains research, engineering artifacts, deployment workflows, and evaluation results for a real-time Edge AI computer vision system designed to support privacy-preserving document detection workflows.

The project combines operational AI engineering, embedded computer vision, edge inference optimization, and healthcare AI governance principles to evaluate the feasibility of upstream privacy-risk mitigation using localized inference architectures running entirely on localized Edge AI infrastructure.

Operational Workflow Overview

Camera Ingestion
        ↓
YOLOv5 Document Detection
        ↓
Detection Stability Logic
        ↓
OCR Trigger Activation
        ↓
Tesseract OCR Extraction
        ↓
Microsoft Presidio Analysis
        ↓
PHI/PII Risk Classification

End-to-End System Architecture

Dissertation Figure 26 — Full End-to-End System (Detection + Triage Combined)

The complete operational pipeline runs locally on an NVIDIA Jetson AGX Orin without dependency on cloud inference. Camera frames flow through YOLOv5 document detection, stability logic, OCR extraction, and Microsoft Presidio entity recognition into a four-class PHI/PII triage workflow. The architecture is privacy-preserving by design: visual data is processed at the edge and never transmitted externally.

Edge AI Workflow Visualization

Conceptual operational workflow visualization demonstrating the Edge AI document-detection pipeline operating on NVIDIA Jetson AGX Orin infrastructure using YOLOv5 inference, OCR extraction, and PHI/PII triage orchestration.

The visualization illustrates the end-to-end operational workflow, event-driven inference pipeline, and privacy-preserving architecture evaluated throughout the research project.

Performance Highlights

Metric	Result
[email protected]	~0.995
[email protected]:0.95	~0.994
Precision	~0.999
Recall	~0.998
End-to-End Latency	~42 ms
Throughput	~23.8 FPS
YOLOv5 Inference	~4.1 ms
NMS Processing	~1.5 ms

Cross-validation was performed across five folds with extremely low variance (σ ≈ 0.001 on [email protected]:0.95), and the frozen model was independently evaluated on a held-out test set of 6,652 images.

Interpretation of Detection Performance

An additional consideration when interpreting the exceptionally high detection metrics is the relative simplicity of the underlying classification task. The model was trained exclusively to detect the presence of documents as a single object class rather than discriminate among multiple competing object categories. Consequently, the detection problem primarily involved object localization against background conditions rather than simultaneous multi-class semantic differentiation.

This reduced classification complexity likely contributed to the elevated precision, recall, and mAP values observed throughout the evaluation process. Furthermore, the controlled environmental conditions, relatively consistent document geometry, and constrained background variability further simplified the operational detection space.

Accordingly, the reported performance metrics should be interpreted primarily as evidence supporting the feasibility and operational reliability of edge-based document localization under the evaluated conditions rather than as evidence of generalized multi-object scene understanding performance.

Model Training and Convergence

Dissertation Figure 28 — Training and Validation Convergence Curves

Box loss, object loss, precision, and recall converge rapidly within early epochs and stabilize cleanly, with no signs of overfitting or instability across 149 epochs of extended training. Classification loss remains at zero throughout, consistent with the single-class document detection setup.

Dissertation Figure 29 — Cross-Fold Performance Variability

Five-fold cross-validation produced tightly clustered performance across precision, recall, [email protected], and [email protected]:0.95, demonstrating that detection performance is not dependent on a specific training partition. This stability is a prerequisite for trustworthy deployment in operational environments.

Detection Results

Dissertation Figure 31 — Detection Outputs vs Ground Truth

Predicted bounding boxes overlaid against ground truth annotations on held-out test imagery. Localization accuracy remains strong across varied document orientations, lighting conditions, and background complexity.

Dissertation Figure 30 — Precision–Recall Curve (Test Set)

The precision–recall curve maintains high precision across the full recall range, with [email protected] of 0.995 and AP@[0.50:0.95] of 0.994 computed under the COCO evaluation protocol.

PHI/PII Classification Performance

Representative synthetic evaluation document (Dissertation Appendix D)

The downstream triage workflow was evaluated against 100 synthetic documents distributed across four privacy categories: PII + PHI, PII only, PHI only, and no sensitive information. Synthetic data was used to maintain ethical and regulatory compliance during evaluation.

Dissertation Figure 33 — Confusion Matrix for PHI/PII Classification

The OCR-and-Presidio classification pipeline achieved a strictly diagonal confusion matrix on the synthetic evaluation set, yielding 100% accuracy, precision, recall, and F1 across all four classes. This result represents an upper bound under structured input conditions; real-world handwritten, occluded, or low-quality documents would require additional robustness work, and that limitation is discussed openly in the dissertation.

Deployment Architecture

Dissertation Figure 22 — Simulated Deployment Architecture

The physical deployment uses an industrial GigE vision camera (Sony IMX236, 1920×1200, up to 41 FPS) mounted overhead and connected through a Gigabit switch to the Jetson AGX Orin. The configuration is fully closed-loop with no external network dependency, making the pipeline suitable for HIPAA-aligned operational environments.

Methodology

Dissertation Figure 15 — Experimental Workflow Pipeline

The study follows a structured pipeline of dataset preparation, annotation transformation, five-fold cross-validation, frozen model selection, held-out test evaluation, and edge inference benchmarking. Each stage is designed for reproducibility and aligned with current best practices in applied machine learning research.

Key Capabilities

Real-time document detection on embedded hardware
Edge-based computer vision inference with no cloud dependency
NVIDIA Jetson AGX Orin deployment
YOLOv5 object detection with TensorRT optimization
OCR and PHI/PII workflow integration
Microsoft Presidio entity recognition
Privacy-preserving Edge AI architecture
Five-fold cross-validation with held-out test evaluation
Operational AI workflow orchestration
Event-driven capture and downstream triage

Technology Stack

YOLOv5
PyTorch
NVIDIA Jetson AGX Orin
TensorRT
DeepStream
OpenCV
Tesseract OCR
Microsoft Presidio
Docker
Python

Research Focus Areas

Edge AI
Computer Vision
Operational AI Systems
AI Governance
Responsible AI
Healthcare AI
Privacy Engineering
Real-Time Inference
Embedded AI Infrastructure

Repository Structure

Directory	Purpose
dissertation	Dissertation abstract and supporting research summary
code	Training, inference, evaluation, and workflow orchestration scripts
data	Dataset references, annotation examples, and synthetic evaluation artifacts
models	YOLOv5 configuration, model-selection, and deployment runtime artifacts
results	Experimental outputs, visualizations, and evaluation artifacts
docs	Architecture, methodology, and deployment documentation
media	Screenshots, diagrams, animations, and demonstrations

Repository Navigation

Dissertation Research

Dissertation Abstract

Edge AI Engineering

Data & Model Resources

Experimental Results

Architecture & Deployment

Repository Highlights

Edge AI Infrastructure

NVIDIA Jetson AGX Orin deployment
Real-time localized inference
Edge-based privacy-preserving workflows
GPU-accelerated computer vision pipeline

Operational AI Engineering

Event-driven workflow orchestration
Stability logic and trigger mechanisms
OCR and PHI/PII extraction integration
Real-time processing optimization

AI Governance & Responsible AI

Privacy-by-design architecture
Upstream PHI risk mitigation
Localized inference strategies
Healthcare-oriented operational safeguards

Dissertation Research Areas

This research integrates:

Real-time computer vision
Edge AI deployment
Embedded AI infrastructure
Operational AI systems
Privacy engineering
Healthcare AI governance
PHI/PII workflow orchestration
AI-enabled risk mitigation

Operational Focus

This repository demonstrates the integration of Edge AI computer vision, operational AI orchestration, OCR extraction, and PHI/PII entity analysis into a unified real-time workflow designed for privacy-preserving healthcare-oriented environments.

The project emphasizes:

localized inference
operational AI engineering
privacy-preserving architectures
AI governance alignment
real-time deployment feasibility
embedded GPU acceleration
workflow orchestration

License

This repository is provided for research, educational, and portfolio demonstration purposes.

Author

Forrest Pascal, PhD
Artificial Intelligence Researcher | Edge AI | Computer Vision | AI Governance

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
code		code
data		data
dissertation/abstract		dissertation/abstract
docs		docs
media		media
models		models
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

edgeAI-document-detection

Executive Summary

Operational Workflow Overview

End-to-End System Architecture

Edge AI Workflow Visualization

Performance Highlights

Interpretation of Detection Performance

Model Training and Convergence

Detection Results

PHI/PII Classification Performance

Deployment Architecture

Methodology

Key Capabilities

Technology Stack

Research Focus Areas

Repository Structure

Repository Navigation

Dissertation Research

Edge AI Engineering

Data & Model Resources

Experimental Results

Architecture & Deployment

Repository Highlights

Edge AI Infrastructure

Operational AI Engineering

AI Governance & Responsible AI

Dissertation Research Areas

Operational Focus

License

Author

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

edgeAI-document-detection

Executive Summary

Operational Workflow Overview

End-to-End System Architecture

Edge AI Workflow Visualization

Performance Highlights

Interpretation of Detection Performance

Model Training and Convergence

Detection Results

PHI/PII Classification Performance

Deployment Architecture

Methodology

Key Capabilities

Technology Stack

Research Focus Areas

Repository Structure

Repository Navigation

Dissertation Research

Edge AI Engineering

Data & Model Resources

Experimental Results

Architecture & Deployment

Repository Highlights

Edge AI Infrastructure

Operational AI Engineering

AI Governance & Responsible AI

Dissertation Research Areas

Operational Focus

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages