Skip to content

bhavya-x/Imprint

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PhotoMemory — Local AI Photo Search for Android

A privacy-first, fully on-device semantic photo search and memory retrieval app for Android. Every model, embedding, and query runs on the phone — there is no INTERNET permission, no cloud service, no telemetry, and no account.

Designed and built around five hard problems: indexing orchestration, hybrid retrieval ranking, face clustering consistency, Android lifecycle management, and on-device inference coordination.


Features

  • Natural-language search powered by SigLIP (CLIP-style) image/text embeddings.
  • Face clustering via MediaPipe Face Detection + MobileFaceNet embeddings + online DBSCAN.
  • OCR-aware retrieval via Google ML Kit on-device Text Recognition (find that OTP screenshot).
  • Hybrid ranking combining CLIP similarity, OCR full-text matches, face confidence, and recency.
  • Thermal- and battery-aware background indexing via WorkManager.
  • Multi-module Gradle project — each module compiles independently.

Project structure

PhotoMemory/
├── app/                          UI shell: Application, MainActivity, navigation, theme
├── core/
│   ├── core-model/               Domain data classes (Photo, Face, Cluster, SearchResult, …)
│   ├── core-util/                Dispatchers, BitmapUtils, AssetUtils, VectorMath, Logger
│   └── core-db/                  Room entities & DAOs, AppDatabase, sqlite-vec bootstrap, VecDatabase
├── ml/
│   ├── ml-clip/                  SigLIP image + text encoder via ONNX Runtime Mobile
│   ├── ml-face/                  FaceDetector (MediaPipe), FaceEmbedder (MobileFaceNet), FaceClusteringEngine
│   └── ml-ocr/                   ML Kit OCR pipeline
└── feature/
    ├── feature-indexing/         IndexingPipeline, WorkManager workers, ThermalGuard, IndexingScreen
    ├── feature-search/           QueryParser, SearchEngine, SearchBarWithChips, SearchViewModel
    ├── feature-gallery/          HomeScreen, PhotoDetailScreen, PhotoThumbnail
    └── feature-faces/            FacesScreen, ClusterDetailScreen

Build

0. Prerequisites

  • Android Studio Hedgehog or later (Iguana / Koala recommended).
  • JDK 17.
  • A device or emulator running Android 10+ (API 29). Indexing performance is dramatically better on physical devices with NNAPI / GPU acceleration.

1. Generate the Gradle wrapper

The repo doesn't ship gradlew / gradlew.bat — bootstrap them once:

gradle wrapper --gradle-version=8.7

2. Stage the ML models

Place the following files in app/src/main/assets/models/:

File Format Source
siglip_image_encoder_int8.onnx ONNX, int8 export from google/siglip-base-patch16-224
siglip_text_encoder_int8.onnx ONNX, int8 same SigLIP checkpoint, text tower
siglip_tokenizer.model SentencePiece vocab (binary or text) exported from the same checkpoint
mobilefacenet_int8.onnx ONNX, int8 from the MobileFaceNet repo
face_detection_short_range.tflite TFLite bundled in MediaPipe Tasks (BlazeFace short range)

Export instructions are below.

3. Stage the sqlite-vec AAR

Download the latest stable AAR from the sqlite-vec releases page and drop the file into app/libs/. The app's build.gradle.kts uses fileTree("dir" to "libs", "include" to ["*.aar"]) so any AAR placed there is picked up automatically.

If you don't yet have the AAR, the app will compile but crash on first DB open with a clear error from SqliteVecBootstrap. Vector search will not work until the AAR is present.

4. Build and install

.\gradlew installDebug

Model export quick-start

Two scripts you can run locally (Python 3.10+, on a desktop — never in this app):

SigLIP → ONNX → int8

pip install transformers onnx onnxruntime optimum[onnxruntime] sentencepiece

# 1. Export FP32 ONNX from the HuggingFace checkpoint
optimum-cli export onnx \
    --model google/siglip-base-patch16-224 \
    --task image-to-text \
    siglip-onnx/

# 2. Quantize to int8 with onnxruntime
python -c "
from onnxruntime.quantization import quantize_dynamic, QuantType
quantize_dynamic('siglip-onnx/visual_model.onnx',
                 'siglip_image_encoder_int8.onnx',
                 weight_type=QuantType.QInt8)
quantize_dynamic('siglip-onnx/text_model.onnx',
                 'siglip_text_encoder_int8.onnx',
                 weight_type=QuantType.QInt8)
"

# 3. Copy the SentencePiece vocab
cp siglip-onnx/sentencepiece.bpe.model siglip_tokenizer.model

Make sure the resulting models accept these inputs / produce these outputs:

Model Input Output
siglip_image_encoder_int8.onnx pixel_values: float32[1,3,224,224] [1, 512] image embedding
siglip_text_encoder_int8.onnx input_ids: int64[1,64], attention_mask: int64[1,64] [1, 512] text embedding

If your export uses different dimensions, update core/core-db/.../AppDatabase.kt VecDimensions.CLIP.

MobileFaceNet → ONNX → int8

# Start from any MobileFaceNet pytorch checkpoint or the TF original.
# Export to ONNX with input shape [1,3,112,112] and 128-dim output, then:
python -c "
from onnxruntime.quantization import quantize_dynamic, QuantType
quantize_dynamic('mobilefacenet_fp32.onnx',
                 'mobilefacenet_int8.onnx',
                 weight_type=QuantType.QInt8)
"

MediaPipe Face Detection model

Use MediaPipe's pre-built Face Detector model bundle (short-range, ~232 KB). Rename to face_detection_short_range.tflite.


Architecture

Database

photos (Room)              ← canonical metadata, OCR text, indexing flags
faces  (Room)              ← bounding boxes, cluster assignment
clusters (Room)            ← user labels, face counts, representative face
indexing_jobs (Room)       ← phased pipeline state machine
ocr_tokens (Room)          ← exact OCR token lookup
photo_fts (Room FTS4)      ← prefix / phrase OCR search
clip_embeddings (sqlite-vec vec0)  ← 512-dim image embeddings, KNN-searchable
face_embeddings (sqlite-vec vec0)  ← 128-dim face embeddings, KNN-searchable

All embeddings are L2-normalised before storage. Cosine similarity is computed as a dot product of normalised vectors; sqlite-vec L2 distance maps monotonically to 1 / (1 + distance).

Indexing pipeline

MediaStore → IndexingJob(PENDING)
   → METADATA   (DATE_TAKEN, EXIF GPS, SHA-256 of first 64 KB)
   → THUMBNAIL  (256×256 JPEG at filesDir/thumbnails/{photoId}.jpg)
   → FACE_DETECT (MediaPipe BlazeFace, score > 0.7)
   → FACE_EMBED  (MobileFaceNet, online DBSCAN clustering, threshold 0.35)
   → OCR         (ML Kit Latin; written to photos.ocrText, ocr_tokens, photo_fts)
   → CLIP_EMBED  (SigLIP image encoder, vec0 upsert)
   → COMPLETE

Each phase is idempotent. A job interrupted at FACE_EMBED resumes from FACE_EMBED.

Hybrid retrieval

   ParsedQuery
       │
       ▼
1. Candidate filtering (persons / dates / screenshots) — Room SQL set intersection
2. CLIP KNN in sqlite-vec (restricted to candidate IDs)
3. OCR FTS hits (photo_fts MATCH)
4. Face confidence aggregation (max detectionScore per photo per cluster)
5. Recency decay (1 / (1 + age_days/30))
       │
       ▼
finalScore = 0.45·clip + 0.25·ocr + 0.20·face + 0.10·recency      (configurable via SearchWeights)

SearchWeights is a @Volatile field on SearchEngine — change it from any debug/settings surface without recompiling.


Privacy guarantees

  1. The AndroidManifest.xml does not request the INTERNET permission. The OS will block any outbound network call from this UID. There is nothing the app can do to bypass this — even if compromised, it cannot phone home.
  2. All ML models ship inside the APK (app/src/main/assets/models/). No runtime downloads.
  3. data_extraction_rules.xml excludes the photo database, thumbnails, and prefs from cloud backup and device-transfer.
  4. The "Clear Index" button in the Indexing tab wipes all derived data (embeddings, thumbnails, OCR tokens). Source photos in MediaStore are never modified.

Tuning

Most knobs live in code so they're easy to find but compile-time safe:

Concern Where
Indexing priorities feature/feature-indexing/.../IndexingPriority.kt
Batch size & thermal thresholds feature/feature-indexing/.../ThermalGuard.kt, workers/InitialIndexingWorker.kt
Face clustering thresholds ml/ml-face/.../FaceClusteringEngine.kt (ASSIGN_THRESHOLD, MERGE_THRESHOLD)
Retrieval weights core/core-model/.../SearchResult.kt (SearchWeights.DEFAULT)
FTS scoring normalisation feature/feature-search/.../SearchEngine.kt (normaliseFtsScore)

Troubleshooting

"sqlite-vec native library not found" The AAR is missing from app/libs/. Re-download from the sqlite-vec releases page and re-sync Gradle.

vec_version() smoke test fails The AAR loaded but the extension wasn't registered with the SQLite connection. This usually means the AAR version is incompatible with your Android SQLite version. Try a newer AAR; the SqliteVecBootstrap shim probes three known init-class names — add yours if the new AAR uses a different package.

Embedding shape mismatch on first search The SigLIP model you exported has a different output dimensionality than the default 512. Update VecDimensions.CLIP in core/core-db/.../AppDatabase.kt and clear the database (Indexing tab → "Clear Index").

Indexing seems stuck Check the Indexing tab — the thermal status line will tell you if the phone is paused due to heat. Plug into power and the work should resume.


License

This project ships under the same liberal terms you'd expect from a hackathon prototype. Models referenced have their own licences (SigLIP is Apache 2.0; MobileFaceNet checkpoints vary; MediaPipe is Apache 2.0; ML Kit is bound by Google's Mobile Services TOS).

About

Android app for on-device semantic photo search and memory retrieval, with no cloud processing or data collection. Combines AI-powered image search, face clustering, OCR, and hybrid ranking to help users instantly find photos using natural language.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages