Skip to content

feat(retrieval): Phase 1 — typed schema and FT.CREATE translation#236

Open
jamby77 wants to merge 6 commits into
feature/retrieval-sdk-valkey-search-kitfrom
feature/retrieval-sdk-phase1-schema-builder
Open

feat(retrieval): Phase 1 — typed schema and FT.CREATE translation#236
jamby77 wants to merge 6 commits into
feature/retrieval-sdk-valkey-search-kitfrom
feature/retrieval-sdk-phase1-schema-builder

Conversation

@jamby77

@jamby77 jamby77 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Summary

Phase 1 of the Retrieval SDK plan, stacked on #234 (Phase 0). Adds the first code of @betterdb/retrieval plus one deferred Phase 0 review item in the kit.

  • Scaffold packages/retrieval (@betterdb/retrieval 0.1.0), mirroring valkey-search-kit, with a workspace dep on the kit
  • Pure buildFtCreateArgs(name, schema, capabilities?) translating the typed index schema (text / tag+separator / numeric+sortable fields; HNSW|FLAT vector as a discriminated union) into the full FT.CREATE argument vector — HNSW defaults M=16 / EF_CONSTRUCTION=200 / EF_RUNTIME=10 always emitted; exported indexName() / keyPrefix() naming helpers
  • TEXT field emission gated via FtCapabilities.textFields — valkey-search < 1.2 rejects TEXT, so callers on older modules get an actionable error instead of a server failure
  • Tighten isIndexNotFoundError in valkey-search-kit: the broad 'not found' substring match is now scoped to index errors ('not found' + 'index' co-occurrence). Verified against live engines — valkey-search 1.2 emits Index with name '…' not found in database 0, Redis 8 emits No such index …; both stay matched, generic key not found-style messages no longer misclassify. The semantic-cache characterization lock was deliberately split into positive + negative cases for this.

Test Plan

  • @betterdb/retrieval unit tests: 32/32 (table-driven, full-vector deep equality)
  • @betterdb/valkey-search-kit unit tests: 32/32 (incl. empirically captured engine phrasings)
  • @betterdb/semantic-cache suite: 191/191 (characterization net intact)
  • semantic-cache integration suite vs live valkey-bundle (valkey-search 1.2, port 6384): 13/13
  • tsc builds clean across the three packages

Stacked PR: base is feature/retrieval-sdk-valkey-search-kit (#234). After #234 merges, this will be rebased onto master and retargeted — do not delete the base branch before that.


Note

Medium Risk
The stricter isIndexNotFoundError heuristic changes runtime behavior for semantic-cache index bootstrap across engine error phrasings; mis-tuned matching could skip auto-create or surface errors instead of recovering.

Overview
Introduces @betterdb/retrieval (Phase 1) as a new workspace package with typed index schema (text / tag / numeric fields plus HNSW or FLAT vector specs) and a pure buildFtCreateArgs translator that emits the full FT.CREATE argument vector, including default HNSW tuning and indexName / keyPrefix helpers. FtCapabilities.textFields blocks TEXT emission on valkey-search < 1.2 with a clear error instead of a server failure.

In valkey-search-kit, isIndexNotFoundError no longer treats every "not found" substring as a missing index—it now requires index-related wording alongside "not found", so messages like key not found are not misclassified. semantic-cache characterization tests are updated to lock in that positive vs negative behavior.

Reviewed by Cursor Bugbot for commit 93458ff. Bugbot is set up for automated code reviews on this repo. Configure here.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0b682d2. Configure here.

Comment thread packages/retrieval/src/ft-create.ts
@jamby77 jamby77 force-pushed the feature/retrieval-sdk-valkey-search-kit branch from 3c8b814 to 6ce7dde Compare June 12, 2026 12:13
@jamby77 jamby77 force-pushed the feature/retrieval-sdk-phase1-schema-builder branch from 0b682d2 to c14395b Compare June 12, 2026 12:15
jamby77 added 6 commits June 12, 2026 15:23
…nslation

- Add RetrievalSchema, FieldSpec, VectorSpec, FtCapabilities types in schema.ts
- Implement pure buildFtCreateArgs in ft-create.ts: HNSW (6 pairs/12 params with
  defaults M=16 EF_CONSTRUCTION=200 EF_RUNTIME=10), FLAT (3 pairs/6 params),
  all three field types (text/tag/separator/numeric/sortable), metric mapping,
  textFields capability gate, dims/fieldName/algorithm-param validation
- 24 table-driven tests, TDD red→green
- Export all public types + builder from index.ts
- Remove --passWithNoTests from test script
- Discriminated-union VectorSpec: HnswVectorSpec / FlatVectorSpec split
  so FLAT cannot carry HNSW params at the type level
- Replace validateDims void + as-cast with requireDims narrowing guard
- Add indexName/keyPrefix helpers with empty-name validation; export both
- resolveVectorFieldName helper eliminates duplicated ?? 'embedding'
- validateFlatHnswParams uses 'in' guards, accepts VectorSpec (no any)
- METRIC_MAP typed as Record<VectorMetric, string>
- Harmonize error messages to include offending value
- Replace no-interpolation template literals with single-quoted strings
- Prettier pass over all src files
- 32 tests (up from 24): FLAT dims missing/invalid, empty/whitespace
  index name, indexName/keyPrefix unit cases; FLAT+HNSW param throw
  tests construct invalid objects via property mutation to avoid casts
  in production code
@jamby77 jamby77 force-pushed the feature/retrieval-sdk-phase1-schema-builder branch from c14395b to 93458ff Compare June 12, 2026 12:24
@KIvanow

KIvanow commented Jun 12, 2026

Copy link
Copy Markdown
Member

Minor DX note: fresh-checkout tests fail until the kit is built

Running pnpm --filter @betterdb/semantic-cache test on a clean checkout currently fails with:

Error: Failed to resolve entry for package "@betterdb/valkey-search-kit".
The package may have incorrect main/module/exports specified in its package.json.
  ❯ src/utils.ts:8:1

Vitest resolves the workspace symlink through the kit's main: ./dist/index.js, so the kit's dist/ has to exist before semantic-cache (and soon retrieval, once it actually imports the kit) tests can run. A turbo test^build dependency, or pointing the kit's dev-time resolution at src/ (e.g. via publishConfig), would make pnpm test work out of the box.

Totally fine to leave as-is if this is already planned for one of the next PRs in the stack — just flagging it so it doesn't get lost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants