Skip to content

feat: integrated dataset-session API (v0.7.0)#8

Merged
guidocerqueira merged 2 commits into
mainfrom
feat/dataset-registry-session-api
Jun 15, 2026
Merged

feat: integrated dataset-session API (v0.7.0)#8
guidocerqueira merged 2 commits into
mainfrom
feat/dataset-registry-session-api

Conversation

@guidocerqueira

Copy link
Copy Markdown
Collaborator

Why

Production (homologação) reads are slow on every request, not just the first. End-to-end log analysis on the deployed pods showed the cause is a combination the prior perf work (connection pooling / prewarm / GridIndex) never touched:

  1. Cross-region S3 — cluster in us-east-2, bucket in us-east-1 → every GET pays cross-region RTT.
  2. Cold-open re-read per run_timetime + lat/lon are chunk data read from S3 on each new run_time; coordinates are identical across run_times but were re-read every time (timeAxisMs 1.4–2.3s, layoutsMs up to 1.7s).
  3. Low-traffic pods keep caches cold → the cold path dominates (readMs p90 ≈ 8.9s; single-model durationMs up to 19.5s).

Consumers also had to hand-wire CachedStore + openGroup + MemoryCache + coordinate caches. This moves all caching into the library.

What

  • ZarrDatasetRegistry (root export): owns handle reuse (LRU + thundering-herd dedup), metadata cache, on-disk chunk cache, per-dataset decoded-chunk MemoryCache, and a shared decoded-array cache. Store-agnostic via a storeFactory.
  • ManagedDataset:
    • read(name, selection, opts) auto-applies the per-dataset decoded-chunk memory cache + observability (callers can't pass memoryCache — it stays dataset-scoped so chunk keys can't collide across datasets).
    • decodedArray(name, { cacheKey, ttlMs }) serves small arrays from L1 (handle) → L2 (shared cache). With a run_time-invariant cacheKey (domain key), coordinate arrays are read once per domain and reused across run_times and pods — eliminating the multi-second cold-open coordinate re-read on cross-region stores.
  • open/openGroup/openArray moved to src/open.ts, re-exported unchanged from the root (no API change).
  • New registry unit tests; version bump 0.6.0 → 0.7.0; CHANGELOG.
  • Diagnostic examples used to attribute the latency: examples/diagnose-prod-read.ts, examples/verify-memcache.ts, and an s3-point mode in benchmark-local-flow.ts.

Backward compatibility

Additive — all existing exports unchanged. 334 tests pass (7 new), lint + typecheck clean, ESM+CJS build verified.

Publish

release.yml publishes on a v* tag matching package.json. After merge, push v0.7.0 to trigger the publish to GitHub Packages. The nautilus-api thin rewrite (consuming this API) is prepared on the nautilus side and lands after the dependency is bumped to ^0.7.0.

🤖 Generated with Claude Code

Add ZarrDatasetRegistry + ManagedDataset: a single place that owns all
caching so consumers stop hand-wiring CachedStore + openGroup + MemoryCache
+ coordinate caches. Aimed at cross-region S3 serving, where every byte is
expensive and low-traffic pods keep caches cold.

- ZarrDatasetRegistry: handle reuse (LRU + thundering-herd dedup), metadata
  cache, on-disk chunk cache, per-dataset decoded-chunk MemoryCache, and a
  shared decoded-array cache. Store-agnostic via a storeFactory.
- ManagedDataset.read auto-applies the per-dataset memory cache + observability.
  ManagedDataset.decodedArray serves small arrays from L1 (handle) -> L2
  (shared cache); with a run_time-invariant cacheKey (domain key), coordinates
  are read once per domain and reused across run_times and pods, eliminating
  the multi-second cold-open coordinate re-read on cross-region stores.
- Move open/openGroup/openArray to src/open.ts (re-exported unchanged).
- Add registry unit tests; bump to 0.7.0; CHANGELOG.

Also adds diagnostic examples (diagnose-prod-read, verify-memcache, s3-point
benchmark mode) used to attribute the production read latency.
@guidocerqueira guidocerqueira merged commit 8c4dd36 into main Jun 15, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant