Skip to content

✨ feat(identity): same-name disambiguation & user-self identity resolution#75

Merged
marcelsamyn merged 17 commits into
mainfrom
claude/jolly-wiles-63ddeb
Jun 17, 2026
Merged

✨ feat(identity): same-name disambiguation & user-self identity resolution#75
marcelsamyn merged 17 commits into
mainfrom
claude/jolly-wiles-63ddeb

Conversation

@marcelsamyn

Copy link
Copy Markdown
Owner

What & why

The knowledge graph attributed the account owner's own first-person statements (from an ingested WhatsApp transcript) to a different same-named person's node — e.g. the user's "I…" routed to a contact also named "Marcel". This is general same-name disambiguation with the account owner as a first-class case — deliberately not a blunt "assume the user when unsure" heuristic (which over-merges).

Four components, in leverage order:

  1. Self-node hygiene — new user-self-identity.ts names the self node with the user's most-specific (multi-token) alias and seeds only multi-token aliases into the alias table — bare first names never enter it, so a same-named contact can never merge into the user (or vice versa). ensureUserSelfIdentity is idempotent + advisory-lock-guarded; called from setUserSelfAliases (config) and ingestTranscript (effective aliases — covers WhatsApp, where stored aliases are empty). Plus a backfill route to repair existing self nodes.
  2. Subject-wiringextract-graph.ts registers resolved speaker nodes into idMap and instructs the model to use a speaker's nodeId as the subject of first-person claims. Load-bearing fix: claim insertion skips subjects not in idMap.
  3. Resolver never guessesresolveIdentity resolves only on a unique canonical/alias match; >1 match records ambiguous, logs identity.ambiguous_skip, and falls through to split. No self-prior anywhere.
  4. Prompt insurance — a cache-safe "who the user is" note (dynamic user message only, never the cached system prompt) + a static anti-conflation rule for document/conversation extraction.

Result: the bug dies twice over — subject-wiring routes the user's utterance to the self node, and even a bare name reaching the resolver splits rather than guessing.

How to test

docker compose up -d db   # Postgres on :5431
pnpm run build:check
pnpm run lint
pnpm run format
pnpm run test -- run \
  src/lib/user-self-identity.test.ts \
  src/lib/transcript/resolve-speakers.test.ts \
  src/lib/jobs/ingest-transcript.test.ts \
  src/lib/identity-resolution.test.ts \
  src/lib/user-profile.test.ts

Key tests: identity-resolution.test.ts (ambiguous tie → no resolve + identity.ambiguous_skip; Marcel Samyn → self, bare Marcel → the contact), ingest-transcript.test.ts (user-self first-person claim attaches to the self node, not a same-named participant).

Operational note

Existing user-self nodes predate the new contract. Run a one-time backfill per user to give them a distinguishing label + multi-token aliases (and strip any previously-seeded bare alias):

POST /maintenance/backfill-user-self-identity   { "userId": "<id>" }

Passing an empty/absent effective alias list clears the self node's alias rows (clean-slate) — pass a non-empty list to avoid that.

Checklist

  • pnpm run build:check (tsc --noEmit + structured-output schema check)
  • pnpm run lint
  • pnpm run format
  • Tests: 530/530 on Postgres :5431 (note: PR CI runs lint/format/build only — vitest is local)
  • No unrelated changes
  • One-time backfill run for existing users (operational follow-up)

🤖 Generated with Claude Code

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements same-name identity disambiguation and self-identity resolution to prevent the knowledge graph from misattributing claims to same-named nodes. It introduces a dedicated user-self-identity module to manage the user's Person node, seeds only distinguishing multi-token aliases, wires transcript speakers as claim subjects, and updates the identity resolver to split instead of guessing on ambiguous ties. Feedback on the changes highlights an optimization opportunity in ensureUserSelfIdentity to avoid redundant database updates to nodeMetadata on every transcript ingestion by checking if the label has actually changed first.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/lib/user-self-identity.ts
@marcelsamyn marcelsamyn merged commit d9d0a79 into main Jun 17, 2026
1 check passed
@marcelsamyn marcelsamyn deleted the claude/jolly-wiles-63ddeb branch June 17, 2026 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant