Skip to content

Use a stronger local embedding model (nomic-embed-text via Ollama) #1

Description

@BeastOfShadow

Problem

The vector collection is created with get_or_create_collection and no embedding
function, so ChromaDB falls back to its default model (all-MiniLM-L6-v2, 384-dim,
English-centric). Retrieval quality is mediocre, especially on non-English notes.

Proposal

Configure ChromaDB to embed via a stronger, fully-local model: nomic-embed-text
(768-dim, multilingual) served through Ollama. Keep it the default so the system
stays 100% local.

Tasks

  • Add an Ollama-backed embedding function (e.g. ChromaDB OllamaEmbeddingFunction or a small custom wrapper)
  • Pass it to get_or_create_collection(..., embedding_function=...) in src/database/vector_db.py
  • Pull instructions in README / start.sh (ollama pull nomic-embed-text)
  • Make the embedding model configurable via .env (EMBED_MODEL=nomic-embed-text)
  • Note: changing the model changes vector dimensionality → existing .chroma_db must be re-indexed

Acceptance

Notes embed and retrieve through nomic-embed-text locally; chat retrieval visibly improves on longer/non-English notes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions