Skip to content

forrest-pascal/edgeAI-document-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

edgeAI-document-detection

Edge AI Banner

Real-Time Edge AI computer vision system for document detection and privacy risk mitigation in healthcare-oriented environments.


Executive Summary

This repository contains research, engineering artifacts, deployment workflows, and evaluation results for a real-time Edge AI computer vision system designed to support privacy-preserving document detection workflows.

The project combines operational AI engineering, embedded computer vision, edge inference optimization, and healthcare AI governance principles to evaluate the feasibility of upstream privacy-risk mitigation using localized inference architectures running entirely on localized Edge AI infrastructure.


Operational Workflow Overview

Camera Ingestion
        ↓
YOLOv5 Document Detection
        ↓
Detection Stability Logic
        ↓
OCR Trigger Activation
        ↓
Tesseract OCR Extraction
        ↓
Microsoft Presidio Analysis
        ↓
PHI/PII Risk Classification

End-to-End System Architecture

System Architecture

Dissertation Figure 26 — Full End-to-End System (Detection + Triage Combined)

The complete operational pipeline runs locally on an NVIDIA Jetson AGX Orin without dependency on cloud inference. Camera frames flow through YOLOv5 document detection, stability logic, OCR extraction, and Microsoft Presidio entity recognition into a four-class PHI/PII triage workflow. The architecture is privacy-preserving by design: visual data is processed at the edge and never transmitted externally.


Edge AI Workflow Visualization

Workflow Visualization

Conceptual operational workflow visualization demonstrating the Edge AI document-detection pipeline operating on NVIDIA Jetson AGX Orin infrastructure using YOLOv5 inference, OCR extraction, and PHI/PII triage orchestration.

The visualization illustrates the end-to-end operational workflow, event-driven inference pipeline, and privacy-preserving architecture evaluated throughout the research project.


Performance Highlights

Metric Result
[email protected] ~0.995
[email protected]:0.95 ~0.994
Precision ~0.999
Recall ~0.998
End-to-End Latency ~42 ms
Throughput ~23.8 FPS
YOLOv5 Inference ~4.1 ms
NMS Processing ~1.5 ms

Cross-validation was performed across five folds with extremely low variance (σ ≈ 0.001 on [email protected]:0.95), and the frozen model was independently evaluated on a held-out test set of 6,652 images.


Interpretation of Detection Performance

An additional consideration when interpreting the exceptionally high detection metrics is the relative simplicity of the underlying classification task. The model was trained exclusively to detect the presence of documents as a single object class rather than discriminate among multiple competing object categories. Consequently, the detection problem primarily involved object localization against background conditions rather than simultaneous multi-class semantic differentiation.

This reduced classification complexity likely contributed to the elevated precision, recall, and mAP values observed throughout the evaluation process. Furthermore, the controlled environmental conditions, relatively consistent document geometry, and constrained background variability further simplified the operational detection space.

Accordingly, the reported performance metrics should be interpreted primarily as evidence supporting the feasibility and operational reliability of edge-based document localization under the evaluated conditions rather than as evidence of generalized multi-object scene understanding performance.


Model Training and Convergence

Training Convergence

Dissertation Figure 28 — Training and Validation Convergence Curves

Box loss, object loss, precision, and recall converge rapidly within early epochs and stabilize cleanly, with no signs of overfitting or instability across 149 epochs of extended training. Classification loss remains at zero throughout, consistent with the single-class document detection setup.

Cross-Fold Variability

Dissertation Figure 29 — Cross-Fold Performance Variability

Five-fold cross-validation produced tightly clustered performance across precision, recall, [email protected], and [email protected]:0.95, demonstrating that detection performance is not dependent on a specific training partition. This stability is a prerequisite for trustworthy deployment in operational environments.


Detection Results

Detection vs Ground Truth

Dissertation Figure 31 — Detection Outputs vs Ground Truth

Predicted bounding boxes overlaid against ground truth annotations on held-out test imagery. Localization accuracy remains strong across varied document orientations, lighting conditions, and background complexity.

Precision Recall Curve

Dissertation Figure 30 — Precision–Recall Curve (Test Set)

The precision–recall curve maintains high precision across the full recall range, with [email protected] of 0.995 and AP@[0.50:0.95] of 0.994 computed under the COCO evaluation protocol.


PHI/PII Classification Performance

Sample Synthetic Document

Representative synthetic evaluation document (Dissertation Appendix D)

The downstream triage workflow was evaluated against 100 synthetic documents distributed across four privacy categories: PII + PHI, PII only, PHI only, and no sensitive information. Synthetic data was used to maintain ethical and regulatory compliance during evaluation.

Confusion Matrix

Dissertation Figure 33 — Confusion Matrix for PHI/PII Classification

The OCR-and-Presidio classification pipeline achieved a strictly diagonal confusion matrix on the synthetic evaluation set, yielding 100% accuracy, precision, recall, and F1 across all four classes. This result represents an upper bound under structured input conditions; real-world handwritten, occluded, or low-quality documents would require additional robustness work, and that limitation is discussed openly in the dissertation.


Deployment Architecture

Deployment Architecture

Dissertation Figure 22 — Simulated Deployment Architecture

The physical deployment uses an industrial GigE vision camera (Sony IMX236, 1920×1200, up to 41 FPS) mounted overhead and connected through a Gigabit switch to the Jetson AGX Orin. The configuration is fully closed-loop with no external network dependency, making the pipeline suitable for HIPAA-aligned operational environments.


Methodology

Experimental Workflow

Dissertation Figure 15 — Experimental Workflow Pipeline

The study follows a structured pipeline of dataset preparation, annotation transformation, five-fold cross-validation, frozen model selection, held-out test evaluation, and edge inference benchmarking. Each stage is designed for reproducibility and aligned with current best practices in applied machine learning research.


Key Capabilities

  • Real-time document detection on embedded hardware
  • Edge-based computer vision inference with no cloud dependency
  • NVIDIA Jetson AGX Orin deployment
  • YOLOv5 object detection with TensorRT optimization
  • OCR and PHI/PII workflow integration
  • Microsoft Presidio entity recognition
  • Privacy-preserving Edge AI architecture
  • Five-fold cross-validation with held-out test evaluation
  • Operational AI workflow orchestration
  • Event-driven capture and downstream triage

Technology Stack

  • YOLOv5
  • PyTorch
  • NVIDIA Jetson AGX Orin
  • TensorRT
  • DeepStream
  • OpenCV
  • Tesseract OCR
  • Microsoft Presidio
  • Docker
  • Python

Research Focus Areas

  • Edge AI
  • Computer Vision
  • Operational AI Systems
  • AI Governance
  • Responsible AI
  • Healthcare AI
  • Privacy Engineering
  • Real-Time Inference
  • Embedded AI Infrastructure

Repository Structure

Directory Purpose
dissertation Dissertation abstract and supporting research summary
code Training, inference, evaluation, and workflow orchestration scripts
data Dataset references, annotation examples, and synthetic evaluation artifacts
models YOLOv5 configuration, model-selection, and deployment runtime artifacts
results Experimental outputs, visualizations, and evaluation artifacts
docs Architecture, methodology, and deployment documentation
media Screenshots, diagrams, animations, and demonstrations

Repository Navigation

Dissertation Research

Edge AI Engineering

Data & Model Resources

Experimental Results

Architecture & Deployment


Repository Highlights

Edge AI Infrastructure

  • NVIDIA Jetson AGX Orin deployment
  • Real-time localized inference
  • Edge-based privacy-preserving workflows
  • GPU-accelerated computer vision pipeline

Operational AI Engineering

  • Event-driven workflow orchestration
  • Stability logic and trigger mechanisms
  • OCR and PHI/PII extraction integration
  • Real-time processing optimization

AI Governance & Responsible AI

  • Privacy-by-design architecture
  • Upstream PHI risk mitigation
  • Localized inference strategies
  • Healthcare-oriented operational safeguards

Dissertation Research Areas

This research integrates:

  • Real-time computer vision
  • Edge AI deployment
  • Embedded AI infrastructure
  • Operational AI systems
  • Privacy engineering
  • Healthcare AI governance
  • PHI/PII workflow orchestration
  • AI-enabled risk mitigation

Operational Focus

This repository demonstrates the integration of Edge AI computer vision, operational AI orchestration, OCR extraction, and PHI/PII entity analysis into a unified real-time workflow designed for privacy-preserving healthcare-oriented environments.

The project emphasizes:

  • localized inference
  • operational AI engineering
  • privacy-preserving architectures
  • AI governance alignment
  • real-time deployment feasibility
  • embedded GPU acceleration
  • workflow orchestration


License

This repository is provided for research, educational, and portfolio demonstration purposes.


Author

Forrest Pascal, PhD
Artificial Intelligence Researcher | Edge AI | Computer Vision | AI Governance

About

Real-time Edge AI computer vision system for document detection and privacy risk mitigation in healthcare-oriented environments.

Topics

Resources

License

Stars

Watchers

Forks

Contributors