Problem
The vector collection is created with get_or_create_collection and no embedding
function, so ChromaDB falls back to its default model (all-MiniLM-L6-v2, 384-dim,
English-centric). Retrieval quality is mediocre, especially on non-English notes.
Proposal
Configure ChromaDB to embed via a stronger, fully-local model: nomic-embed-text
(768-dim, multilingual) served through Ollama. Keep it the default so the system
stays 100% local.
Tasks
Acceptance
Notes embed and retrieve through nomic-embed-text locally; chat retrieval visibly improves on longer/non-English notes.
Problem
The vector collection is created with
get_or_create_collectionand no embeddingfunction, so ChromaDB falls back to its default model (
all-MiniLM-L6-v2, 384-dim,English-centric). Retrieval quality is mediocre, especially on non-English notes.
Proposal
Configure ChromaDB to embed via a stronger, fully-local model:
nomic-embed-text(768-dim, multilingual) served through Ollama. Keep it the default so the system
stays 100% local.
Tasks
OllamaEmbeddingFunctionor a small custom wrapper)get_or_create_collection(..., embedding_function=...)insrc/database/vector_db.pystart.sh(ollama pull nomic-embed-text).env(EMBED_MODEL=nomic-embed-text).chroma_dbmust be re-indexedAcceptance
Notes embed and retrieve through
nomic-embed-textlocally; chat retrieval visibly improves on longer/non-English notes.