A privacy-first, fully on-device semantic photo search and memory retrieval app for Android.
Every model, embedding, and query runs on the phone — there is no INTERNET permission, no cloud
service, no telemetry, and no account.
Designed and built around five hard problems: indexing orchestration, hybrid retrieval ranking, face clustering consistency, Android lifecycle management, and on-device inference coordination.
- Natural-language search powered by SigLIP (CLIP-style) image/text embeddings.
- Face clustering via MediaPipe Face Detection + MobileFaceNet embeddings + online DBSCAN.
- OCR-aware retrieval via Google ML Kit on-device Text Recognition (find that OTP screenshot).
- Hybrid ranking combining CLIP similarity, OCR full-text matches, face confidence, and recency.
- Thermal- and battery-aware background indexing via WorkManager.
- Multi-module Gradle project — each module compiles independently.
PhotoMemory/
├── app/ UI shell: Application, MainActivity, navigation, theme
├── core/
│ ├── core-model/ Domain data classes (Photo, Face, Cluster, SearchResult, …)
│ ├── core-util/ Dispatchers, BitmapUtils, AssetUtils, VectorMath, Logger
│ └── core-db/ Room entities & DAOs, AppDatabase, sqlite-vec bootstrap, VecDatabase
├── ml/
│ ├── ml-clip/ SigLIP image + text encoder via ONNX Runtime Mobile
│ ├── ml-face/ FaceDetector (MediaPipe), FaceEmbedder (MobileFaceNet), FaceClusteringEngine
│ └── ml-ocr/ ML Kit OCR pipeline
└── feature/
├── feature-indexing/ IndexingPipeline, WorkManager workers, ThermalGuard, IndexingScreen
├── feature-search/ QueryParser, SearchEngine, SearchBarWithChips, SearchViewModel
├── feature-gallery/ HomeScreen, PhotoDetailScreen, PhotoThumbnail
└── feature-faces/ FacesScreen, ClusterDetailScreen
- Android Studio Hedgehog or later (Iguana / Koala recommended).
- JDK 17.
- A device or emulator running Android 10+ (API 29). Indexing performance is dramatically better on physical devices with NNAPI / GPU acceleration.
The repo doesn't ship gradlew / gradlew.bat — bootstrap them once:
gradle wrapper --gradle-version=8.7Place the following files in app/src/main/assets/models/:
| File | Format | Source |
|---|---|---|
siglip_image_encoder_int8.onnx |
ONNX, int8 | export from google/siglip-base-patch16-224 |
siglip_text_encoder_int8.onnx |
ONNX, int8 | same SigLIP checkpoint, text tower |
siglip_tokenizer.model |
SentencePiece vocab (binary or text) | exported from the same checkpoint |
mobilefacenet_int8.onnx |
ONNX, int8 | from the MobileFaceNet repo |
face_detection_short_range.tflite |
TFLite | bundled in MediaPipe Tasks (BlazeFace short range) |
Export instructions are below.
Download the latest stable AAR from the
sqlite-vec releases page and drop the file into
app/libs/. The app's build.gradle.kts uses fileTree("dir" to "libs", "include" to ["*.aar"])
so any AAR placed there is picked up automatically.
If you don't yet have the AAR, the app will compile but crash on first DB open with a clear
error from SqliteVecBootstrap. Vector search will not work until the AAR is present.
.\gradlew installDebugTwo scripts you can run locally (Python 3.10+, on a desktop — never in this app):
pip install transformers onnx onnxruntime optimum[onnxruntime] sentencepiece
# 1. Export FP32 ONNX from the HuggingFace checkpoint
optimum-cli export onnx \
--model google/siglip-base-patch16-224 \
--task image-to-text \
siglip-onnx/
# 2. Quantize to int8 with onnxruntime
python -c "
from onnxruntime.quantization import quantize_dynamic, QuantType
quantize_dynamic('siglip-onnx/visual_model.onnx',
'siglip_image_encoder_int8.onnx',
weight_type=QuantType.QInt8)
quantize_dynamic('siglip-onnx/text_model.onnx',
'siglip_text_encoder_int8.onnx',
weight_type=QuantType.QInt8)
"
# 3. Copy the SentencePiece vocab
cp siglip-onnx/sentencepiece.bpe.model siglip_tokenizer.modelMake sure the resulting models accept these inputs / produce these outputs:
| Model | Input | Output |
|---|---|---|
siglip_image_encoder_int8.onnx |
pixel_values: float32[1,3,224,224] |
[1, 512] image embedding |
siglip_text_encoder_int8.onnx |
input_ids: int64[1,64], attention_mask: int64[1,64] |
[1, 512] text embedding |
If your export uses different dimensions, update core/core-db/.../AppDatabase.kt VecDimensions.CLIP.
# Start from any MobileFaceNet pytorch checkpoint or the TF original.
# Export to ONNX with input shape [1,3,112,112] and 128-dim output, then:
python -c "
from onnxruntime.quantization import quantize_dynamic, QuantType
quantize_dynamic('mobilefacenet_fp32.onnx',
'mobilefacenet_int8.onnx',
weight_type=QuantType.QInt8)
"Use MediaPipe's pre-built Face Detector model bundle
(short-range, ~232 KB). Rename to face_detection_short_range.tflite.
photos (Room) ← canonical metadata, OCR text, indexing flags
faces (Room) ← bounding boxes, cluster assignment
clusters (Room) ← user labels, face counts, representative face
indexing_jobs (Room) ← phased pipeline state machine
ocr_tokens (Room) ← exact OCR token lookup
photo_fts (Room FTS4) ← prefix / phrase OCR search
clip_embeddings (sqlite-vec vec0) ← 512-dim image embeddings, KNN-searchable
face_embeddings (sqlite-vec vec0) ← 128-dim face embeddings, KNN-searchable
All embeddings are L2-normalised before storage. Cosine similarity is computed as a dot product
of normalised vectors; sqlite-vec L2 distance maps monotonically to 1 / (1 + distance).
MediaStore → IndexingJob(PENDING)
→ METADATA (DATE_TAKEN, EXIF GPS, SHA-256 of first 64 KB)
→ THUMBNAIL (256×256 JPEG at filesDir/thumbnails/{photoId}.jpg)
→ FACE_DETECT (MediaPipe BlazeFace, score > 0.7)
→ FACE_EMBED (MobileFaceNet, online DBSCAN clustering, threshold 0.35)
→ OCR (ML Kit Latin; written to photos.ocrText, ocr_tokens, photo_fts)
→ CLIP_EMBED (SigLIP image encoder, vec0 upsert)
→ COMPLETE
Each phase is idempotent. A job interrupted at FACE_EMBED resumes from FACE_EMBED.
ParsedQuery
│
▼
1. Candidate filtering (persons / dates / screenshots) — Room SQL set intersection
2. CLIP KNN in sqlite-vec (restricted to candidate IDs)
3. OCR FTS hits (photo_fts MATCH)
4. Face confidence aggregation (max detectionScore per photo per cluster)
5. Recency decay (1 / (1 + age_days/30))
│
▼
finalScore = 0.45·clip + 0.25·ocr + 0.20·face + 0.10·recency (configurable via SearchWeights)
SearchWeights is a @Volatile field on SearchEngine — change it from any debug/settings
surface without recompiling.
- The
AndroidManifest.xmldoes not request theINTERNETpermission. The OS will block any outbound network call from this UID. There is nothing the app can do to bypass this — even if compromised, it cannot phone home. - All ML models ship inside the APK (
app/src/main/assets/models/). No runtime downloads. data_extraction_rules.xmlexcludes the photo database, thumbnails, and prefs from cloud backup and device-transfer.- The "Clear Index" button in the Indexing tab wipes all derived data (embeddings, thumbnails, OCR tokens). Source photos in MediaStore are never modified.
Most knobs live in code so they're easy to find but compile-time safe:
| Concern | Where |
|---|---|
| Indexing priorities | feature/feature-indexing/.../IndexingPriority.kt |
| Batch size & thermal thresholds | feature/feature-indexing/.../ThermalGuard.kt, workers/InitialIndexingWorker.kt |
| Face clustering thresholds | ml/ml-face/.../FaceClusteringEngine.kt (ASSIGN_THRESHOLD, MERGE_THRESHOLD) |
| Retrieval weights | core/core-model/.../SearchResult.kt (SearchWeights.DEFAULT) |
| FTS scoring normalisation | feature/feature-search/.../SearchEngine.kt (normaliseFtsScore) |
"sqlite-vec native library not found"
The AAR is missing from app/libs/. Re-download from the
sqlite-vec releases page and re-sync Gradle.
vec_version() smoke test fails
The AAR loaded but the extension wasn't registered with the SQLite connection. This usually means
the AAR version is incompatible with your Android SQLite version. Try a newer AAR; the
SqliteVecBootstrap shim probes three known init-class names — add yours if the new AAR uses a
different package.
Embedding shape mismatch on first search
The SigLIP model you exported has a different output dimensionality than the default 512.
Update VecDimensions.CLIP in core/core-db/.../AppDatabase.kt and clear the database (Indexing
tab → "Clear Index").
Indexing seems stuck Check the Indexing tab — the thermal status line will tell you if the phone is paused due to heat. Plug into power and the work should resume.
This project ships under the same liberal terms you'd expect from a hackathon prototype. Models referenced have their own licences (SigLIP is Apache 2.0; MobileFaceNet checkpoints vary; MediaPipe is Apache 2.0; ML Kit is bound by Google's Mobile Services TOS).