Make RdmaEndpoint::_state atomic to fix data race on TCP fallback#55
Closed
chenBright wants to merge 1 commit into
Closed
Make RdmaEndpoint::_state atomic to fix data race on TCP fallback#55chenBright wants to merge 1 commit into
chenBright wants to merge 1 commit into
Conversation
_state was a plain enum written by the handshake bthread and read concurrently by the event-dispatching thread (OnNewDataFromTcp), which is a data race. Since the _state store had no release semantics, under a weak memory model the event thread could observe _state == FALLBACK_TCP yet not see the appended magic bytes, then enter OnNewMessages and read a stale/partial _read_buf. Make _state a butil::atomic<State>: - Terminal-state stores use release and the matching loads use acquire, so data published before a terminal state (the magic bytes put back into _read_buf, and the RDMA window/resource setup before ESTABLISHED) is visible to the reader. - Non-terminal handshake transitions use relaxed.
There was a problem hiding this comment.
Pull request overview
This PR addresses a concurrency bug in the RDMA/TCP handshake path by making RdmaEndpoint::_state atomic and introducing explicit acquire/release semantics so the TCP event thread can safely observe handshake terminal states (e.g., FALLBACK_TCP, ESTABLISHED) along with the data they’re intended to “publish” (such as the re-appended magic bytes on TCP fallback).
Changes:
- Convert
RdmaEndpoint::_statefrom a plainStateenum tobutil::atomic<State>. - Replace direct reads/writes of
_statewithload()/store()and apply acquire/release semantics on key transitions and reads. - Adjust TCP read/dispatch logic (
OnNewDataFromTcp,TryReadOnTcp, debug/state reporting) to use atomic loads.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/brpc/rdma/rdma_endpoint.h | Changes _state to an atomic and updates the TryReadOnTcp declaration accordingly. |
| src/brpc/rdma/rdma_endpoint.cpp | Updates all _state accesses to atomic operations and adds acquire/release ordering around handshake terminal states and TCP fallback/established paths. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: resolve
Problem Summary:
RdmaEndpoint::_stateis a plain (non-atomic) enum, but it is writtenby the RDMA handshake bthread and read concurrently by the
event dispatcher bthread in
OnNewDataFromTcp. This is a data race,and on a weak memory model (ARM / POWER / RISC-V) it can let the two
threads concurrently mutate
_socket->_read_bufand corrupt anIOBuf.What is changed and the side effects?
Changed:
Make
_stateabutil::atomic<State>.memory_order_releaseand the corresponding loads usememory_order_acquire, so that everything published before a terminal state becomesvisible to the reader:
memory_order_relaxed.Side effects:
Performance effects:
Breaking backward compatibility:
Check List: