Skip to content

Fix TCP RST data loss when closing a connection with in-flight PING frames#112

Open
anluerz wants to merge 2 commits into
moby:mainfrom
anluerz:fix/rst-data-loss
Open

Fix TCP RST data loss when closing a connection with in-flight PING frames#112
anluerz wants to merge 2 commits into
moby:mainfrom
anluerz:fix/rst-data-loss

Conversation

@anluerz
Copy link
Copy Markdown

@anluerz anluerz commented Jun 1, 2026

Fixes #111

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes data loss caused by TCP RST during connection shutdown when an inbound packet (e.g., a SPDY PING) arrives at a socket that still has data in the kernel send buffer. Before closing the underlying connection, shutdown now half-closes the write side and drains any inbound bytes for up to 10 seconds so the kernel emits FIN instead of RST and the peer can finish receiving in-flight data. Also fixes a minor race in Ping by reading s.pingId under pingLock.

Changes:

  • In Connection.shutdown, after all streams close, perform CloseWrite + bounded io.Copy(io.Discard, s.conn) drain before s.conn.Close().
  • Move pid := s.pingId inside the pingLock critical section in Connection.Ping.
  • Add TestReceiveBufferIsDrained to reproduce the truncated-transfer scenario with backpressure and concurrent pings.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
connection.go Drain inbound bytes during graceful shutdown; tighten locking around pingId read in Ping.
spdy_test.go New regression test that fills the send buffer, closes the server side, sends pings, and verifies all bytes are received.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread connection.go Outdated
Comment on lines +749 to +755
if cw, ok := s.conn.(interface{ CloseWrite() error }); ok {
if err := cw.CloseWrite(); err == nil {
_ = s.conn.SetReadDeadline(time.Now().Add(10 * time.Second))
n, _ := io.Copy(io.Discard, s.conn)
debugMessage("(%p) drained %d bytes during shutdown", s, n)
}
}
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced the io.Copy approach following your suggestion. The new implementation calls CloseWrite() to half-close the write direction, then waits on s.closeChan for Serve() to exit naturally. This means there is only ever one reader on s.conn at any time. After CloseWrite(), the peer receives FIN and eventually closes its write direction in response, Serve() gets EOF, exits its loop, and closes closeChan. At that point conn.Close() is called exclusively. Serve() ignores all write errors, so closing the write end does not lead to an error exit from Serve().

Comment thread connection.go Outdated
// sending; io.Copy consumes anything already in flight.
if cw, ok := s.conn.(interface{ CloseWrite() error }); ok {
if err := cw.CloseWrite(); err == nil {
_ = s.conn.SetReadDeadline(time.Now().Add(10 * time.Second))
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed this as follows: When closeTimeout is set, the drain wait shares the existing timeout channel, so the total shutdown duration respects the caller's budget. When closeTimeout is zero (no timeout configured), a fixed 10-second drain timer is used as a fallback to prevent hanging indefinitely if Serve() was not called or the peer does not respond to FIN. In that case there is no caller-configured budget to respect, so the 10-second default seems appropriate.

Comment thread spdy_test.go Outdated
Comment on lines +1085 to +1093
last := bytesWritten.Load()
for {
time.Sleep(100 * time.Millisecond)
if last == bytesWritten.Load() {
break
}
last = bytesWritten.Load()
}
writingDone.Store(true)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the send buffer isn't actually full when we close, the RST race window is smaller and the test is less likely to fail, not more. Given that 100ms under normal circumstances is ample, I am hesitant to add more complicated logic here.

Comment thread spdy_test.go Outdated
anluerz added 2 commits June 2, 2026 11:05
Server closes connection while client sends periodic PING frames.
A PING arriving after conn.Close() triggers RST, causing the kernel
to discard the server's TCP send buffer.

Signed-off-by: Andreas Erz <[email protected]>
Half-close with CloseWrite() and wait for Serve() to exit before calling
conn.Close(), so late peer packets don't trigger RST. Also fix a pre-existing
data race in Ping() where s.pingId was read outside pingLock.

Signed-off-by: Andreas Erz <[email protected]>
@anluerz anluerz force-pushed the fix/rst-data-loss branch from b5b8683 to ef47932 Compare June 2, 2026 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TCP RST on close discards send-buffer data when peer sends late inbound packets

2 participants