fix: guard reset path against concurrent overwrites in ETS and Atomic backends#184
Merged
epinault merged 1 commit intoMay 14, 2026
Conversation
…es in ETS and Atomic backends
epinault
added a commit
that referenced
this pull request
May 18, 2026
* Add :fix_window_per_key algorithm for ETS and Atomic backends A fixed-window variant whose window is anchored to each key's first hit instead of a globally-aligned wall-clock epoch. Keeps the same one-entry-per-key memory profile as :fix_window. The 2x boundary burst is still possible per key, but boundaries are no longer globally synchronized, so they cannot be exploited deterministically. Same semantics as the common Redis INCR + EXPIRE NX pattern. Closes #181. * fix(fix_window_per_key): guard reset path against concurrent overwrites in ETS and Atomic backends (#184) * refactor: extract helpers to fix credo nesting depth warnings - Add allow_or_deny/3 to ETS and Atomic FixWindowPerKey backends - Extract do_inc/4 in Atomic backend, mirroring do_hit/5 --------- Co-authored-by: vittorio-reinaudo <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix concurrent window-reset race in
fix_window_per_key(ETS and Atomic)Background
PR #183 introduced
:fix_window_per_key, a per-key fixed-window algorithm that anchors each key's window to its first hit rather than a shared wall-clock epoch. The feature itself is sound; this PR addresses a correctness bug in the reset path that appears when multiple processes hit the same key at the moment its window expires.The race condition
ETS backend —
hit/5andinc/4When a key's window has expired (or the key is absent),
hit/5falls through to::ets.insert/2is an unconditional overwrite. If N processes all call:ets.lookupand observeexpires_at ≤ nowbefore any of them completes the insert, each one will write{key, 1, new_expires_at}and the last write wins. All N callers receive{:allow, 1}, but the stored counter is1— N−1 hits are silently dropped.The same pattern exists in
inc/4.Atomic backend —
do_hit/5andinc/4The reset path in
do_hituses two separate atomic operations:These are not atomic together. N processes that all read
expires_at ≤ nowbefore anyexchangecompletes will each callput(1, increment), overwriting one another. The lastputwins and the counter isincrementwhile N{:allow, increment}responses have already been returned.Additionally,
inc/4in the Atomic backend used:ets.insert(unconditional) instead of:ets.insert_newfor the initial atomics-ref creation, matching a separate but related race already correctly avoided inhit/5.Why
:fix_windowdoes not have this problemHammer.ETS.FixWindowandHammer.Atomic.FixWindowuse{key, window_number}as the ETS key. A new window is a genuinely new key; the active-windowupdate_counter/add_getpath is reached on the very first hit. There is no "reset in place" of an existing key — the problem is specific to:fix_window_per_key.Fix
ETS —
insert_new+select_replace+ retryReplace the bare
:ets.insertwith a three-step CAS-style sequence:Both
:ets.insert_newand:ets.select_replaceare ETS operations that guarantee at most one writer succeeds. Exactly one process resets the counter; all others fall back to theupdate_counterpath once the window is live.Atomic —
compare_exchange+ reset-lock sentinelReplace the non-atomic
put + exchangepair with a CAS on theexpires_atslot using a sentinel value (@reset_lock = 0xFFFFFFFFFFFFFFFF, the unsigned 64-bit max — unreachable by any real timestamp):While
@reset_lockis set, no other process can enter theadd_getpath (thecondchecks== @reset_lockbefore> now). This guarantees the winner'sput(counter)is never overwritten by a stale peer'sputbefore the realexpires_atis published.Files changed
lib/hammer/ets/fix_window_per_key.exhit/5,inc/4: replace:ets.insertwithinsert_new+select_replace+ recurselib/hammer/atomic/fix_window_per_key.exdo_hit/5: CAS with@reset_locksentinel;inc/4: same CAS pattern + fixinsert→insert_newfor atomics-ref creationtest/hammer/ets/fix_window_per_key_test.exs"concurrent expiry reset"describe: 200-task stress test assertingget == allowsafter a synchronized expiry-boundary hittest/hammer/atomic/fix_window_per_key_test.exsTests
The new
"concurrent expiry reset"tests in both files follow the same structure:receive :gobarrier.get(table, key, scale) == total_allows— any under-count reveals a lost reset.The stress tests pass consistently with the fix in place. The invariant they protect (
counter == allows) cannot be satisfied by the unfixed code under concurrent expiry pressure on multi-core schedulers.All 131 existing tests continue to pass.
Notes for reviewers
@reset_locksentinel is safe:nowis milliseconds since epoch (~1.7 × 10¹²);now + scalefor any realistic scale never approaches2⁶⁴ − 1 ≈ 1.8 × 10¹⁹.do_hitcalling itself,inccalling itself) terminate: after the winner publishesnow + scale, all spinners read a validexpires_at > nowand proceed viaadd_get. There is no livelock — each reset event has exactly one winner.Hammer.ETS.FixWindowandHammer.Atomic.FixWindoware not touched; their{key, window}keying strategy does not have this class of bug.