feat: integrated dataset-session API (v0.7.0)#8
Merged
Conversation
Add ZarrDatasetRegistry + ManagedDataset: a single place that owns all caching so consumers stop hand-wiring CachedStore + openGroup + MemoryCache + coordinate caches. Aimed at cross-region S3 serving, where every byte is expensive and low-traffic pods keep caches cold. - ZarrDatasetRegistry: handle reuse (LRU + thundering-herd dedup), metadata cache, on-disk chunk cache, per-dataset decoded-chunk MemoryCache, and a shared decoded-array cache. Store-agnostic via a storeFactory. - ManagedDataset.read auto-applies the per-dataset memory cache + observability. ManagedDataset.decodedArray serves small arrays from L1 (handle) -> L2 (shared cache); with a run_time-invariant cacheKey (domain key), coordinates are read once per domain and reused across run_times and pods, eliminating the multi-second cold-open coordinate re-read on cross-region stores. - Move open/openGroup/openArray to src/open.ts (re-exported unchanged). - Add registry unit tests; bump to 0.7.0; CHANGELOG. Also adds diagnostic examples (diagnose-prod-read, verify-memcache, s3-point benchmark mode) used to attribute the production read latency.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Production (homologação) reads are slow on every request, not just the first. End-to-end log analysis on the deployed pods showed the cause is a combination the prior perf work (connection pooling / prewarm / GridIndex) never touched:
us-east-2, bucket inus-east-1→ every GET pays cross-region RTT.time+lat/lonare chunk data read from S3 on each new run_time; coordinates are identical across run_times but were re-read every time (timeAxisMs1.4–2.3s,layoutsMsup to 1.7s).readMsp90 ≈ 8.9s; single-modeldurationMsup to 19.5s).Consumers also had to hand-wire
CachedStore+openGroup+MemoryCache+ coordinate caches. This moves all caching into the library.What
ZarrDatasetRegistry(root export): owns handle reuse (LRU + thundering-herd dedup), metadata cache, on-disk chunk cache, per-dataset decoded-chunkMemoryCache, and a shared decoded-array cache. Store-agnostic via astoreFactory.ManagedDataset:read(name, selection, opts)auto-applies the per-dataset decoded-chunk memory cache + observability (callers can't passmemoryCache— it stays dataset-scoped so chunk keys can't collide across datasets).decodedArray(name, { cacheKey, ttlMs })serves small arrays from L1 (handle) → L2 (shared cache). With a run_time-invariantcacheKey(domain key), coordinate arrays are read once per domain and reused across run_times and pods — eliminating the multi-second cold-open coordinate re-read on cross-region stores.open/openGroup/openArraymoved tosrc/open.ts, re-exported unchanged from the root (no API change).examples/diagnose-prod-read.ts,examples/verify-memcache.ts, and ans3-pointmode inbenchmark-local-flow.ts.Backward compatibility
Additive — all existing exports unchanged. 334 tests pass (7 new), lint + typecheck clean, ESM+CJS build verified.
Publish
release.ymlpublishes on av*tag matchingpackage.json. After merge, pushv0.7.0to trigger the publish to GitHub Packages. The nautilus-api thin rewrite (consuming this API) is prepared on the nautilus side and lands after the dependency is bumped to^0.7.0.🤖 Generated with Claude Code