Real-Time Edge AI computer vision system for document detection and privacy risk mitigation in healthcare-oriented environments.
This repository contains research, engineering artifacts, deployment workflows, and evaluation results for a real-time Edge AI computer vision system designed to support privacy-preserving document detection workflows.
The project combines operational AI engineering, embedded computer vision, edge inference optimization, and healthcare AI governance principles to evaluate the feasibility of upstream privacy-risk mitigation using localized inference architectures running entirely on localized Edge AI infrastructure.
Camera Ingestion
↓
YOLOv5 Document Detection
↓
Detection Stability Logic
↓
OCR Trigger Activation
↓
Tesseract OCR Extraction
↓
Microsoft Presidio Analysis
↓
PHI/PII Risk Classification
Dissertation Figure 26 — Full End-to-End System (Detection + Triage Combined)
The complete operational pipeline runs locally on an NVIDIA Jetson AGX Orin without dependency on cloud inference. Camera frames flow through YOLOv5 document detection, stability logic, OCR extraction, and Microsoft Presidio entity recognition into a four-class PHI/PII triage workflow. The architecture is privacy-preserving by design: visual data is processed at the edge and never transmitted externally.
Conceptual operational workflow visualization demonstrating the Edge AI document-detection pipeline operating on NVIDIA Jetson AGX Orin infrastructure using YOLOv5 inference, OCR extraction, and PHI/PII triage orchestration.
The visualization illustrates the end-to-end operational workflow, event-driven inference pipeline, and privacy-preserving architecture evaluated throughout the research project.
| Metric | Result |
|---|---|
| [email protected] | ~0.995 |
| [email protected]:0.95 | ~0.994 |
| Precision | ~0.999 |
| Recall | ~0.998 |
| End-to-End Latency | ~42 ms |
| Throughput | ~23.8 FPS |
| YOLOv5 Inference | ~4.1 ms |
| NMS Processing | ~1.5 ms |
Cross-validation was performed across five folds with extremely low variance (σ ≈ 0.001 on [email protected]:0.95), and the frozen model was independently evaluated on a held-out test set of 6,652 images.
An additional consideration when interpreting the exceptionally high detection metrics is the relative simplicity of the underlying classification task. The model was trained exclusively to detect the presence of documents as a single object class rather than discriminate among multiple competing object categories. Consequently, the detection problem primarily involved object localization against background conditions rather than simultaneous multi-class semantic differentiation.
This reduced classification complexity likely contributed to the elevated precision, recall, and mAP values observed throughout the evaluation process. Furthermore, the controlled environmental conditions, relatively consistent document geometry, and constrained background variability further simplified the operational detection space.
Accordingly, the reported performance metrics should be interpreted primarily as evidence supporting the feasibility and operational reliability of edge-based document localization under the evaluated conditions rather than as evidence of generalized multi-object scene understanding performance.
Dissertation Figure 28 — Training and Validation Convergence Curves
Box loss, object loss, precision, and recall converge rapidly within early epochs and stabilize cleanly, with no signs of overfitting or instability across 149 epochs of extended training. Classification loss remains at zero throughout, consistent with the single-class document detection setup.
Dissertation Figure 29 — Cross-Fold Performance Variability
Five-fold cross-validation produced tightly clustered performance across precision, recall, [email protected], and [email protected]:0.95, demonstrating that detection performance is not dependent on a specific training partition. This stability is a prerequisite for trustworthy deployment in operational environments.
Dissertation Figure 31 — Detection Outputs vs Ground Truth
Predicted bounding boxes overlaid against ground truth annotations on held-out test imagery. Localization accuracy remains strong across varied document orientations, lighting conditions, and background complexity.
Dissertation Figure 30 — Precision–Recall Curve (Test Set)
The precision–recall curve maintains high precision across the full recall range, with [email protected] of 0.995 and AP@[0.50:0.95] of 0.994 computed under the COCO evaluation protocol.
Representative synthetic evaluation document (Dissertation Appendix D)
The downstream triage workflow was evaluated against 100 synthetic documents distributed across four privacy categories: PII + PHI, PII only, PHI only, and no sensitive information. Synthetic data was used to maintain ethical and regulatory compliance during evaluation.
Dissertation Figure 33 — Confusion Matrix for PHI/PII Classification
The OCR-and-Presidio classification pipeline achieved a strictly diagonal confusion matrix on the synthetic evaluation set, yielding 100% accuracy, precision, recall, and F1 across all four classes. This result represents an upper bound under structured input conditions; real-world handwritten, occluded, or low-quality documents would require additional robustness work, and that limitation is discussed openly in the dissertation.
Dissertation Figure 22 — Simulated Deployment Architecture
The physical deployment uses an industrial GigE vision camera (Sony IMX236, 1920×1200, up to 41 FPS) mounted overhead and connected through a Gigabit switch to the Jetson AGX Orin. The configuration is fully closed-loop with no external network dependency, making the pipeline suitable for HIPAA-aligned operational environments.
Dissertation Figure 15 — Experimental Workflow Pipeline
The study follows a structured pipeline of dataset preparation, annotation transformation, five-fold cross-validation, frozen model selection, held-out test evaluation, and edge inference benchmarking. Each stage is designed for reproducibility and aligned with current best practices in applied machine learning research.
- Real-time document detection on embedded hardware
- Edge-based computer vision inference with no cloud dependency
- NVIDIA Jetson AGX Orin deployment
- YOLOv5 object detection with TensorRT optimization
- OCR and PHI/PII workflow integration
- Microsoft Presidio entity recognition
- Privacy-preserving Edge AI architecture
- Five-fold cross-validation with held-out test evaluation
- Operational AI workflow orchestration
- Event-driven capture and downstream triage
- YOLOv5
- PyTorch
- NVIDIA Jetson AGX Orin
- TensorRT
- DeepStream
- OpenCV
- Tesseract OCR
- Microsoft Presidio
- Docker
- Python
- Edge AI
- Computer Vision
- Operational AI Systems
- AI Governance
- Responsible AI
- Healthcare AI
- Privacy Engineering
- Real-Time Inference
- Embedded AI Infrastructure
| Directory | Purpose |
|---|---|
| dissertation | Dissertation abstract and supporting research summary |
| code | Training, inference, evaluation, and workflow orchestration scripts |
| data | Dataset references, annotation examples, and synthetic evaluation artifacts |
| models | YOLOv5 configuration, model-selection, and deployment runtime artifacts |
| results | Experimental outputs, visualizations, and evaluation artifacts |
| docs | Architecture, methodology, and deployment documentation |
| media | Screenshots, diagrams, animations, and demonstrations |
- NVIDIA Jetson AGX Orin deployment
- Real-time localized inference
- Edge-based privacy-preserving workflows
- GPU-accelerated computer vision pipeline
- Event-driven workflow orchestration
- Stability logic and trigger mechanisms
- OCR and PHI/PII extraction integration
- Real-time processing optimization
- Privacy-by-design architecture
- Upstream PHI risk mitigation
- Localized inference strategies
- Healthcare-oriented operational safeguards
This research integrates:
- Real-time computer vision
- Edge AI deployment
- Embedded AI infrastructure
- Operational AI systems
- Privacy engineering
- Healthcare AI governance
- PHI/PII workflow orchestration
- AI-enabled risk mitigation
This repository demonstrates the integration of Edge AI computer vision, operational AI orchestration, OCR extraction, and PHI/PII entity analysis into a unified real-time workflow designed for privacy-preserving healthcare-oriented environments.
The project emphasizes:
- localized inference
- operational AI engineering
- privacy-preserving architectures
- AI governance alignment
- real-time deployment feasibility
- embedded GPU acceleration
- workflow orchestration
This repository is provided for research, educational, and portfolio demonstration purposes.
Forrest Pascal, PhD
Artificial Intelligence Researcher | Edge AI | Computer Vision | AI Governance










