Skip to content

TheTechTiger/VisionX

Repository files navigation

VisionX — AI-Powered Smart Camera

A privacy-first, low-cost ($16.70) smart camera system that combines an ESP32-CAM with a local Ollama vision model for real-time scene understanding, face recognition, and accessibility — all without cloud dependencies.

Demo

Features

  • Live MJPEG Streaming — ESP32 streams video over TCP; viewable in a cyberpunk-styled web UI
  • Touch-Triggered AI Detection — Capacitive touch sensor triggers AI scene analysis via Ollama (gemma3:27b)
  • Face Recognition — InsightFace (buffalo_l) identifies known people from reference images
  • OLED Display — 128x32 SSD1306 shows short AI labels (e.g. "Alice & Bob")
  • Text-to-Speech — Web Speech API (browser) + Termux TTS (Android) for accessibility
  • Video Recording — Browser-based (MediaRecorder/WebM) or server-side (OpenCV/AVI)
  • 100% Local — No cloud APIs, no data leaves your network

Architecture

ESP32-CAM ──MJPEG──▶ Browser ──POST /api/detect──▶ Flask API ──▶ Ollama
  (firmware)        (VX3.html)                    (app.py)      (gemma3:27b)

Hardware

Component Cost
ESP32 AI Thinker CAM $8.50
SSD1306 128x32 OLED $3.50
TTP223B Touch Sensor $1.20
TP4056 + 300mAh LiPo $3.50
Total ~$16.70

Getting Started

ESP32 Firmware

cd VisionXHardwareSketch
# Edit WiFi credentials in src/esp_cam.ino
pio run --target upload
pio device monitor --baud 115200

Python Backend

cd PythonImageTesting
source .venv/bin/activate
python app.py              # Flask API on :80
# or for Android/Termux:
python termux.py           # Bridge on :5000

Ollama must be running locally with gemma3:27b-cloud:

ollama pull gemma3:27b-cloud

Wiring

GPIO Function Connection
GPIO 14 I2C SDA OLED SDA
GPIO 15 I2C SCL OLED SCL
GPIO 13 Touch TTP223B OUT

Web UI

Open http://<esp32-ip>/ in a browser. The VX3 interface features:

  • Live camera feed with animated HUD overlays
  • Touch sensor visualization
  • AI detection panel with scene descriptions
  • Recording controls with live timer
  • TTS with auto-read toggle

Project Structure

VisionX/
├── PythonImageTesting/      # Flask backend + web frontend
│   ├── app.py               # Main API server
│   ├── image_analyzer.py    # Face recognition script
│   ├── termux.py            # Android bridge server
│   ├── templates/           # Web UIs (VX3.html, visionx2.html, UI2.html)
│   └── peoples/             # Reference face images
├── VisionXHardwareSketch/   # ESP32 firmware (PlatformIO)
│   └── src/esp_cam.ino      # Main firmware
└── Presentations/           # Docs, slides, demo video

API Endpoints

Endpoint Port Description
GET / 80 Serve VX3 web UI
POST /api/detect 80 AI analysis via Ollama
GET /touch 80 Read touch sensor
GET /display?msg= 80 Send text to OLED
GET /toggle 8000 Start/stop video recording

Built With

  • ESP32 / Arduino / PlatformIO — Firmware
  • Python / Flask — Backend API
  • Ollama / Gemma 3 — Vision AI
  • InsightFace / ONNX — Face recognition
  • OpenCV — Computer vision
  • Vanilla HTML/JS — Web UI

License

MIT

About

VisionX is a privacy-focused,AI-powered smart camera system built for under $20. Combining an ESP32-CAM with a local Ollama vision model, it delivers real-time scene understanding and face recognition—all without cloud dependencies. Features live MJPEG streaming, touch-triggered AI analysis, OLED display, and TTS feedback through a cyberpunk web UI

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors