A privacy-first, low-cost ($16.70) smart camera system that combines an ESP32-CAM with a local Ollama vision model for real-time scene understanding, face recognition, and accessibility — all without cloud dependencies.
- Live MJPEG Streaming — ESP32 streams video over TCP; viewable in a cyberpunk-styled web UI
- Touch-Triggered AI Detection — Capacitive touch sensor triggers AI scene analysis via Ollama (
gemma3:27b) - Face Recognition — InsightFace (
buffalo_l) identifies known people from reference images - OLED Display — 128x32 SSD1306 shows short AI labels (e.g.
"Alice & Bob") - Text-to-Speech — Web Speech API (browser) + Termux TTS (Android) for accessibility
- Video Recording — Browser-based (MediaRecorder/WebM) or server-side (OpenCV/AVI)
- 100% Local — No cloud APIs, no data leaves your network
ESP32-CAM ──MJPEG──▶ Browser ──POST /api/detect──▶ Flask API ──▶ Ollama
(firmware) (VX3.html) (app.py) (gemma3:27b)
| Component | Cost |
|---|---|
| ESP32 AI Thinker CAM | $8.50 |
| SSD1306 128x32 OLED | $3.50 |
| TTP223B Touch Sensor | $1.20 |
| TP4056 + 300mAh LiPo | $3.50 |
| Total | ~$16.70 |
cd VisionXHardwareSketch
# Edit WiFi credentials in src/esp_cam.ino
pio run --target upload
pio device monitor --baud 115200cd PythonImageTesting
source .venv/bin/activate
python app.py # Flask API on :80
# or for Android/Termux:
python termux.py # Bridge on :5000Ollama must be running locally with gemma3:27b-cloud:
ollama pull gemma3:27b-cloud| GPIO | Function | Connection |
|---|---|---|
| GPIO 14 | I2C SDA | OLED SDA |
| GPIO 15 | I2C SCL | OLED SCL |
| GPIO 13 | Touch | TTP223B OUT |
Open http://<esp32-ip>/ in a browser. The VX3 interface features:
- Live camera feed with animated HUD overlays
- Touch sensor visualization
- AI detection panel with scene descriptions
- Recording controls with live timer
- TTS with auto-read toggle
VisionX/
├── PythonImageTesting/ # Flask backend + web frontend
│ ├── app.py # Main API server
│ ├── image_analyzer.py # Face recognition script
│ ├── termux.py # Android bridge server
│ ├── templates/ # Web UIs (VX3.html, visionx2.html, UI2.html)
│ └── peoples/ # Reference face images
├── VisionXHardwareSketch/ # ESP32 firmware (PlatformIO)
│ └── src/esp_cam.ino # Main firmware
└── Presentations/ # Docs, slides, demo video
| Endpoint | Port | Description |
|---|---|---|
GET / |
80 | Serve VX3 web UI |
POST /api/detect |
80 | AI analysis via Ollama |
GET /touch |
80 | Read touch sensor |
GET /display?msg= |
80 | Send text to OLED |
GET /toggle |
8000 | Start/stop video recording |
- ESP32 / Arduino / PlatformIO — Firmware
- Python / Flask — Backend API
- Ollama / Gemma 3 — Vision AI
- InsightFace / ONNX — Face recognition
- OpenCV — Computer vision
- Vanilla HTML/JS — Web UI
MIT