VisionX — AI-Powered Smart Camera

A privacy-first, low-cost ($16.70) smart camera system that combines an ESP32-CAM with a local Ollama vision model for real-time scene understanding, face recognition, and accessibility — all without cloud dependencies.

Features

Live MJPEG Streaming — ESP32 streams video over TCP; viewable in a cyberpunk-styled web UI
Touch-Triggered AI Detection — Capacitive touch sensor triggers AI scene analysis via Ollama (gemma3:27b)
Face Recognition — InsightFace (buffalo_l) identifies known people from reference images
OLED Display — 128x32 SSD1306 shows short AI labels (e.g. "Alice & Bob")
Text-to-Speech — Web Speech API (browser) + Termux TTS (Android) for accessibility
Video Recording — Browser-based (MediaRecorder/WebM) or server-side (OpenCV/AVI)
100% Local — No cloud APIs, no data leaves your network

Architecture

ESP32-CAM ──MJPEG──▶ Browser ──POST /api/detect──▶ Flask API ──▶ Ollama
  (firmware)        (VX3.html)                    (app.py)      (gemma3:27b)

Hardware

Component	Cost
ESP32 AI Thinker CAM	$8.50
SSD1306 128x32 OLED	$3.50
TTP223B Touch Sensor	$1.20
TP4056 + 300mAh LiPo	$3.50
Total	~$16.70

Getting Started

ESP32 Firmware

cd VisionXHardwareSketch
# Edit WiFi credentials in src/esp_cam.ino
pio run --target upload
pio device monitor --baud 115200

Python Backend

cd PythonImageTesting
source .venv/bin/activate
python app.py              # Flask API on :80
# or for Android/Termux:
python termux.py           # Bridge on :5000

Ollama must be running locally with gemma3:27b-cloud:

ollama pull gemma3:27b-cloud

Wiring

GPIO	Function	Connection
GPIO 14	I2C SDA	OLED SDA
GPIO 15	I2C SCL	OLED SCL
GPIO 13	Touch	TTP223B OUT

Web UI

Open http://<esp32-ip>/ in a browser. The VX3 interface features:

Live camera feed with animated HUD overlays
Touch sensor visualization
AI detection panel with scene descriptions
Recording controls with live timer
TTS with auto-read toggle

Project Structure

VisionX/
├── PythonImageTesting/      # Flask backend + web frontend
│   ├── app.py               # Main API server
│   ├── image_analyzer.py    # Face recognition script
│   ├── termux.py            # Android bridge server
│   ├── templates/           # Web UIs (VX3.html, visionx2.html, UI2.html)
│   └── peoples/             # Reference face images
├── VisionXHardwareSketch/   # ESP32 firmware (PlatformIO)
│   └── src/esp_cam.ino      # Main firmware
└── Presentations/           # Docs, slides, demo video

API Endpoints

Endpoint	Port	Description
`GET /`	80	Serve VX3 web UI
`POST /api/detect`	80	AI analysis via Ollama
`GET /touch`	80	Read touch sensor
`GET /display?msg=`	80	Send text to OLED
`GET /toggle`	8000	Start/stop video recording

Built With

ESP32 / Arduino / PlatformIO — Firmware
Python / Flask — Backend API
Ollama / Gemma 3 — Vision AI
InsightFace / ONNX — Face recognition
OpenCV — Computer vision
Vanilla HTML/JS — Web UI

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
Presentations		Presentations
PythonImageTesting		PythonImageTesting
VisionXHardwareSketch		VisionXHardwareSketch
.gitignore		.gitignore
Abstract.md		Abstract.md
README.md		README.md
VisionX.code-workspace		VisionX.code-workspace
VisionXAbstract.pdf		VisionXAbstract.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VisionX — AI-Powered Smart Camera

Features

Architecture

Hardware

Getting Started

ESP32 Firmware

Python Backend

Wiring

Web UI

Project Structure

API Endpoints

Built With

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VisionX — AI-Powered Smart Camera

Features

Architecture

Hardware

Getting Started

ESP32 Firmware

Python Backend

Wiring

Web UI

Project Structure

API Endpoints

Built With

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages