PhotoMemory — Local AI Photo Search for Android

A privacy-first, fully on-device semantic photo search and memory retrieval app for Android. Every model, embedding, and query runs on the phone — there is no INTERNET permission, no cloud service, no telemetry, and no account.

Designed and built around five hard problems: indexing orchestration, hybrid retrieval ranking, face clustering consistency, Android lifecycle management, and on-device inference coordination.

Features

Natural-language search powered by SigLIP (CLIP-style) image/text embeddings.
Face clustering via MediaPipe Face Detection + MobileFaceNet embeddings + online DBSCAN.
OCR-aware retrieval via Google ML Kit on-device Text Recognition (find that OTP screenshot).
Hybrid ranking combining CLIP similarity, OCR full-text matches, face confidence, and recency.
Thermal- and battery-aware background indexing via WorkManager.
Multi-module Gradle project — each module compiles independently.

Project structure

PhotoMemory/
├── app/                          UI shell: Application, MainActivity, navigation, theme
├── core/
│   ├── core-model/               Domain data classes (Photo, Face, Cluster, SearchResult, …)
│   ├── core-util/                Dispatchers, BitmapUtils, AssetUtils, VectorMath, Logger
│   └── core-db/                  Room entities & DAOs, AppDatabase, sqlite-vec bootstrap, VecDatabase
├── ml/
│   ├── ml-clip/                  SigLIP image + text encoder via ONNX Runtime Mobile
│   ├── ml-face/                  FaceDetector (MediaPipe), FaceEmbedder (MobileFaceNet), FaceClusteringEngine
│   └── ml-ocr/                   ML Kit OCR pipeline
└── feature/
    ├── feature-indexing/         IndexingPipeline, WorkManager workers, ThermalGuard, IndexingScreen
    ├── feature-search/           QueryParser, SearchEngine, SearchBarWithChips, SearchViewModel
    ├── feature-gallery/          HomeScreen, PhotoDetailScreen, PhotoThumbnail
    └── feature-faces/            FacesScreen, ClusterDetailScreen

Build

0. Prerequisites

Android Studio Hedgehog or later (Iguana / Koala recommended).
JDK 17.
A device or emulator running Android 10+ (API 29). Indexing performance is dramatically better on physical devices with NNAPI / GPU acceleration.

1. Generate the Gradle wrapper

The repo doesn't ship gradlew / gradlew.bat — bootstrap them once:

gradle wrapper --gradle-version=8.7

2. Stage the ML models

Place the following files in app/src/main/assets/models/:

File	Format	Source
`siglip_image_encoder_int8.onnx`	ONNX, int8	export from `google/siglip-base-patch16-224`
`siglip_text_encoder_int8.onnx`	ONNX, int8	same SigLIP checkpoint, text tower
`siglip_tokenizer.model`	SentencePiece vocab (binary or text)	exported from the same checkpoint
`mobilefacenet_int8.onnx`	ONNX, int8	from the MobileFaceNet repo
`face_detection_short_range.tflite`	TFLite	bundled in MediaPipe Tasks (BlazeFace short range)

Export instructions are below.

3. Stage the sqlite-vec AAR

Download the latest stable AAR from the sqlite-vec releases page and drop the file into app/libs/. The app's build.gradle.kts uses fileTree("dir" to "libs", "include" to ["*.aar"]) so any AAR placed there is picked up automatically.

If you don't yet have the AAR, the app will compile but crash on first DB open with a clear error from SqliteVecBootstrap. Vector search will not work until the AAR is present.

4. Build and install

.\gradlew installDebug

Model export quick-start

Two scripts you can run locally (Python 3.10+, on a desktop — never in this app):

SigLIP → ONNX → int8

pip install transformers onnx onnxruntime optimum[onnxruntime] sentencepiece

# 1. Export FP32 ONNX from the HuggingFace checkpoint
optimum-cli export onnx \
    --model google/siglip-base-patch16-224 \
    --task image-to-text \
    siglip-onnx/

# 2. Quantize to int8 with onnxruntime
python -c "
from onnxruntime.quantization import quantize_dynamic, QuantType
quantize_dynamic('siglip-onnx/visual_model.onnx',
                 'siglip_image_encoder_int8.onnx',
                 weight_type=QuantType.QInt8)
quantize_dynamic('siglip-onnx/text_model.onnx',
                 'siglip_text_encoder_int8.onnx',
                 weight_type=QuantType.QInt8)
"

# 3. Copy the SentencePiece vocab
cp siglip-onnx/sentencepiece.bpe.model siglip_tokenizer.model

Make sure the resulting models accept these inputs / produce these outputs:

Model	Input	Output
`siglip_image_encoder_int8.onnx`	`pixel_values: float32[1,3,224,224]`	`[1, 512]` image embedding
`siglip_text_encoder_int8.onnx`	`input_ids: int64[1,64]`, `attention_mask: int64[1,64]`	`[1, 512]` text embedding

If your export uses different dimensions, update core/core-db/.../AppDatabase.kt VecDimensions.CLIP.

MobileFaceNet → ONNX → int8

# Start from any MobileFaceNet pytorch checkpoint or the TF original.
# Export to ONNX with input shape [1,3,112,112] and 128-dim output, then:
python -c "
from onnxruntime.quantization import quantize_dynamic, QuantType
quantize_dynamic('mobilefacenet_fp32.onnx',
                 'mobilefacenet_int8.onnx',
                 weight_type=QuantType.QInt8)
"

MediaPipe Face Detection model

Use MediaPipe's pre-built Face Detector model bundle (short-range, ~232 KB). Rename to face_detection_short_range.tflite.

Architecture

Database

photos (Room)              ← canonical metadata, OCR text, indexing flags
faces  (Room)              ← bounding boxes, cluster assignment
clusters (Room)            ← user labels, face counts, representative face
indexing_jobs (Room)       ← phased pipeline state machine
ocr_tokens (Room)          ← exact OCR token lookup
photo_fts (Room FTS4)      ← prefix / phrase OCR search
clip_embeddings (sqlite-vec vec0)  ← 512-dim image embeddings, KNN-searchable
face_embeddings (sqlite-vec vec0)  ← 128-dim face embeddings, KNN-searchable

All embeddings are L2-normalised before storage. Cosine similarity is computed as a dot product of normalised vectors; sqlite-vec L2 distance maps monotonically to 1 / (1 + distance).

Indexing pipeline

MediaStore → IndexingJob(PENDING)
   → METADATA   (DATE_TAKEN, EXIF GPS, SHA-256 of first 64 KB)
   → THUMBNAIL  (256×256 JPEG at filesDir/thumbnails/{photoId}.jpg)
   → FACE_DETECT (MediaPipe BlazeFace, score > 0.7)
   → FACE_EMBED  (MobileFaceNet, online DBSCAN clustering, threshold 0.35)
   → OCR         (ML Kit Latin; written to photos.ocrText, ocr_tokens, photo_fts)
   → CLIP_EMBED  (SigLIP image encoder, vec0 upsert)
   → COMPLETE

Each phase is idempotent. A job interrupted at FACE_EMBED resumes from FACE_EMBED.

Hybrid retrieval

   ParsedQuery
       │
       ▼
1. Candidate filtering (persons / dates / screenshots) — Room SQL set intersection
2. CLIP KNN in sqlite-vec (restricted to candidate IDs)
3. OCR FTS hits (photo_fts MATCH)
4. Face confidence aggregation (max detectionScore per photo per cluster)
5. Recency decay (1 / (1 + age_days/30))
       │
       ▼
finalScore = 0.45·clip + 0.25·ocr + 0.20·face + 0.10·recency      (configurable via SearchWeights)

SearchWeights is a @Volatile field on SearchEngine — change it from any debug/settings surface without recompiling.

Privacy guarantees

The AndroidManifest.xml does not request the INTERNET permission. The OS will block any outbound network call from this UID. There is nothing the app can do to bypass this — even if compromised, it cannot phone home.
All ML models ship inside the APK (app/src/main/assets/models/). No runtime downloads.
data_extraction_rules.xml excludes the photo database, thumbnails, and prefs from cloud backup and device-transfer.
The "Clear Index" button in the Indexing tab wipes all derived data (embeddings, thumbnails, OCR tokens). Source photos in MediaStore are never modified.

Tuning

Most knobs live in code so they're easy to find but compile-time safe:

Concern	Where
Indexing priorities	`feature/feature-indexing/.../IndexingPriority.kt`
Batch size & thermal thresholds	`feature/feature-indexing/.../ThermalGuard.kt`, `workers/InitialIndexingWorker.kt`
Face clustering thresholds	`ml/ml-face/.../FaceClusteringEngine.kt` (`ASSIGN_THRESHOLD`, `MERGE_THRESHOLD`)
Retrieval weights	`core/core-model/.../SearchResult.kt` (`SearchWeights.DEFAULT`)
FTS scoring normalisation	`feature/feature-search/.../SearchEngine.kt` (`normaliseFtsScore`)

Troubleshooting

"sqlite-vec native library not found" The AAR is missing from app/libs/. Re-download from the sqlite-vec releases page and re-sync Gradle.

vec_version() smoke test fails The AAR loaded but the extension wasn't registered with the SQLite connection. This usually means the AAR version is incompatible with your Android SQLite version. Try a newer AAR; the SqliteVecBootstrap shim probes three known init-class names — add yours if the new AAR uses a different package.

Embedding shape mismatch on first search The SigLIP model you exported has a different output dimensionality than the default 512. Update VecDimensions.CLIP in core/core-db/.../AppDatabase.kt and clear the database (Indexing tab → "Clear Index").

Indexing seems stuck Check the Indexing tab — the thermal status line will tell you if the phone is paused due to heat. Plug into power and the work should resume.

License

This project ships under the same liberal terms you'd expect from a hackathon prototype. Models referenced have their own licences (SigLIP is Apache 2.0; MobileFaceNet checkpoints vary; MediaPipe is Apache 2.0; ML Kit is bound by Google's Mobile Services TOS).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
app		app
core		core
feature		feature
gradle		gradle
ml		ml
.gitignore		.gitignore
README.md		README.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PhotoMemory — Local AI Photo Search for Android

Features

Project structure

Build

0. Prerequisites

1. Generate the Gradle wrapper

2. Stage the ML models

3. Stage the sqlite-vec AAR

4. Build and install

Model export quick-start

SigLIP → ONNX → int8

MobileFaceNet → ONNX → int8

MediaPipe Face Detection model

Architecture

Database

Indexing pipeline

Hybrid retrieval

Privacy guarantees

Tuning

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PhotoMemory — Local AI Photo Search for Android

Features

Project structure

Build

0. Prerequisites

1. Generate the Gradle wrapper

2. Stage the ML models

3. Stage the sqlite-vec AAR

4. Build and install

Model export quick-start

SigLIP → ONNX → int8

MobileFaceNet → ONNX → int8

MediaPipe Face Detection model

Architecture

Database

Indexing pipeline

Hybrid retrieval

Privacy guarantees

Tuning

Troubleshooting

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages