Fix CAS false-success via per-segment generation in a 64-bit token#24
Merged
brayniac merged 3 commits intoJun 11, 2026
Merged
Conversation
Add a 16-bit generation counter to SegmentHeader, bumped each time a segment is returned to the free queue (SegmentHeader::reset, called only from Segments::push_free, which all recycle paths funnel through). This will let CAS tokens distinguish different lifetimes of the same location, fixing a false-success ABA case where a recycled segment places the same key at the same offset. The field fits in existing padding (26 -> 24 bytes); the header stays 64 bytes. Mirrors the generation field in crucible's segment headers. Co-Authored-By: Claude Fable 5 <[email protected]>
Replace the u32 location-truncated CAS token with a 64-bit token combining the 44-bit packed location with the segment's 16-bit generation counter (bits 59..44), ported from crucible's CasToken. The old token had a false-success case: (a) truncating the 44-bit location to u32 aliases locations whose seg_ids differ by a multiple of 4096, and (b) even at full width, a recycled segment can place the same key at the same offset, letting a stale token match a value the client never observed. The generation counter, bumped on every segment recycle, distinguishes lifetimes of the same location. The 64-bit width also matches the memcache protocol's "cas unique". Co-Authored-By: Claude Fable 5 <[email protected]>
Deterministically reproduce the segment-recycle ABA: insert a key, capture its token, clear() the cache (push_free prepends the freed segment to the free queue and bumps its generation), then reinsert the same key — it lands at the identical 44-bit location. The stale token must now fail with Exists; with location-only tokens it falsely succeeded. Precondition asserts verify the test really reproduces the same-location scenario, so it fails loudly if free-queue ordering ever changes. Verified the test bites: with the generation bump removed, stale and fresh tokens are identical (the false-success case). Also widen the fuzz target's cas argument to u64. Co-Authored-By: Claude Fable 5 <[email protected]>
Member
|
Neat design. LGTM |
thinkingfish
approved these changes
Jun 11, 2026
thinkingfish
pushed a commit
to pelikan-io/pelikan
that referenced
this pull request
Jun 12, 2026
Upstream cache-rs fixed the CAS false-success regression by encoding a per-segment generation into a 64-bit token (pelikan-io/cache-rs#24). Adopt the new commit: - Drop the u32 narrowing casts in entrystore now that the storage cas() method takes the full u64 token from the memcache protocol. - Rework the 'cas stored' integration test to fetch the real cas token via gets instead of assuming tokens are a counter starting at 1; the token is opaque to clients under the new location+generation scheme. https://claude.ai/code/session_01MbdtNhjhyFVtsrPQF6wJjJ
thinkingfish
added a commit
to pelikan-io/pelikan
that referenced
this pull request
Jun 14, 2026
* Update dependencies and address security advisories - Run cargo update to bring all transitive dependencies to their latest semver-compatible versions. This resolves three rustls-webpki vulnerabilities (RUSTSEC-2026-0098, RUSTSEC-2026-0099, RUSTSEC-2026-0104) by moving to 0.103.13, and clears the rand unsoundness warning (RUSTSEC-2026-0097) via rand 0.8.6. - Replace the unmaintained rustls-pemfile crate (RUSTSEC-2025-0134) with the PEM parsing built into rustls-pki-types. - Keep the segcache git dependency pinned at 5d1d870b: the newer upstream commit regresses memcache CAS semantics (returns EXISTS where STORED is expected), failing the segcache integration tests. cargo audit now reports no vulnerabilities or warnings. https://claude.ai/code/session_01MbdtNhjhyFVtsrPQF6wJjJ * Update segcache to upstream with CAS generation-token fix Upstream cache-rs fixed the CAS false-success regression by encoding a per-segment generation into a 64-bit token (pelikan-io/cache-rs#24). Adopt the new commit: - Drop the u32 narrowing casts in entrystore now that the storage cas() method takes the full u64 token from the memcache protocol. - Rework the 'cas stored' integration test to fetch the real cas token via gets instead of assuming tokens are a counter starting at 1; the token is opaque to clients under the new location+generation scheme. https://claude.ai/code/session_01MbdtNhjhyFVtsrPQF6wJjJ * Bump remaining transitive dependencies to latest patch releases Pick up patch/minor updates that became available since the last bump: cc, js-sys, memchr, smallvec, wasip2, the wasm-bindgen family, web-sys, and zeroize. cache-rs is already at the latest main commit (868705f). cargo audit remains clean; full workspace test suite passes. * Bump segcache to cache-rs PR #25 (reader-pinned segments) Pin the segcache git dependency to the head of cache-rs PR #25 (745246571a7a67738f6db73021b5609d5d6cc2d7), which adds reader-pin RAII guards and drain-but-defer-free handling so pinned segments are not recycled while readers hold references. The change is internal to segcache; the only caller-visible effect is that Item::cas() now returns u64 directly, so drop the now-useless .into() widening conversions in the memcache entrystore. This is pinned to an unmerged PR head; re-point to the cache-rs default branch once PR #25 lands on main. Full workspace tests pass; cargo audit and clippy are clean. * Move segcache from git dependency to crates.io 0.3.0 cache-rs published segcache 0.3.0 (and keyvalue 0.2.0) to crates.io, incorporating the reader-pinned segment work. Replace the git+rev pin with the released crate so the build no longer depends on an unmerged PR head or a git checkout. Both segcache and its transitive keyvalue dependency now resolve from the registry. No source changes required; full workspace tests pass, cargo audit and clippy are clean. --------- Co-authored-by: Claude <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The current CAS token is the item's packed location truncated to u32. This has a false-success (ABA) case:
This follows the discussion on #23: rather than reintroducing a side counter (which becomes a two-word protocol once operations are concurrent), keep the location-derived token and fix its defects the way crucible's
CasTokendoes.Changes
generationcounter toSegmentHeader(fits in existing padding; header stays 64 bytes), bumped inreset()— called only frompush_free, which all recycle paths (eviction, merge, S3-FIFO, expiration, clear) funnel through.CasToken: a 60-bit token in u64 — generation in bits 59..44, location in bits 43..0. The token remains the content of the single hashtable slot word plus quasi-static segment metadata, preserving the single-word atomic swap path for future concurrent CAS.Item::cas()andSegcache::cas()from u32 to u64, matching the memcache protocol's 64-bit "cas unique".Testing
cas_stale_token_rejected_after_segment_recycletest deterministically reproduces the ABA scenario (clear() recycles the segment LIFO; reinserting the same key lands at the identical 44-bit location) and asserts the stale token now fails withExists. Precondition asserts verify the same-location setup, so the test fails loudly rather than silently not exercising ABA if free-queue ordering changes.cargo test --workspace,--features debug, loom suite (8/8 models),clippy --all-targets --all-features -- -D warnings, andfmt --checkare all green; fuzz target updated and compiles.🤖 Generated with Claude Code