fix: validate embeddings for NaN/Inf values with automatic retry by BaiYouQing · Pull Request #1511 · getzep/graphiti

BaiYouQing · 2026-05-25T16:33:54Z

Description

Embedding providers can occasionally return corrupted responses containing NaN or Inf values in the embedding vector. When this happens, cosine similarity comparisons (used for entity deduplication) silently produce wrong results — cosine_similarity(NaN, any_vector) never exceeds 0.6, causing the dedup step to miss existing entities and create duplicate nodes.

This is related to Issue #1505.

Changes

Added _validate_embedding() method that checks for NaN/Inf using numpy
On detection, logs a warning and raises ValueError to trigger a single retry
Both create() and create_batch() use this validation with automatic retry
Added logging and numpy imports

Testing

Tested with a local FalkorDB instance and DeepSeek/Tencent embedding APIs
NaN detection correctly catches corrupt embeddings and retries
After retry, the second call typically returns valid embeddings

Embedding providers can occasionally return corrupted responses containing NaN or Inf values. When this happens, cosine similarity comparisons silently produce wrong results - cosine_similarity(nan, vec) never exceeds 0.6, causing the dedup step to miss existing entities and create duplicates. Changes: - Add _validate_embedding() that checks for NaN/Inf using numpy - Both create() and create_batch() validate with automatic single retry - Add numpy as a dependency

Adelagric · 2026-05-28T06:26:10Z

Strong direction. Two extensions worth keeping in mind once this lands:

Dimension validation. embedder/openai.py:60,66 silently truncates with [: self.config.embedding_dim] — if the provider returns fewer dims than expected, the resulting vector is shorter than EMBEDDING_DIM, mixing inconsistent-length vectors inside the same index. Worth a length check alongside the isfinite guard.
L2 normalization invariant. helpers.py:116-119 applies normalize_l2 inline in only two places (bulk edge dedup, MMR), but vectors persisted to the graph backend are the raw provider output. With FalkorDB's non-normalized cosine semantics (graph_queries.py:158), this drifts the meaning of the 0.6 dedup threshold.

Both could fit as separate sub-issues if the maintainers prefer to keep this PR focused on NaN/Inf. Out-of-process alternative for callers who want the full set of invariants enforced today: https://github.com/Adelagric/vector-router (Apache 2.0). Not a competitor to this PR — they compose.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: validate embeddings for NaN/Inf values with automatic retry#1511

fix: validate embeddings for NaN/Inf values with automatic retry#1511
BaiYouQing wants to merge 1 commit into
getzep:mainfrom
BaiYouQing:fix/nan-embedding-validation

BaiYouQing commented May 25, 2026

Uh oh!

Adelagric commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BaiYouQing commented May 25, 2026

Description

Changes

Testing

Uh oh!

Adelagric commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants