Skip to content

Commit 0781136

Browse files
committed
Merge branch 'tcp-fix-listener-wakeup-after-reuseport-migration'
Zhenzhong Wu says: ==================== tcp: fix listener wakeup after reuseport migration This series fixes a missing wakeup when inet_csk_listen_stop() migrates an established child socket from a closing listener to another socket in the same SO_REUSEPORT group after the child has already been queued for accept. The target listener receives the migrated accept-queue entry via inet_csk_reqsk_queue_add(), but its waiters are not notified. Nonblocking accept() still succeeds because it checks the accept queue directly, but readiness-based waiters can remain asleep until another connection generates a wakeup. Patch 1 notifies the target listener after a successful migration in inet_csk_listen_stop() and protects the post-queue_add() nsk accesses with rcu_read_lock()/rcu_read_unlock(). Patch 2 extends the existing migrate_reuseport BPF selftest with epoll readiness checks inside migrate_dance(), around shutdown() where the migration happens. The test now verifies that the target listener is not ready before migration and becomes ready immediately after it, for both TCP_ESTABLISHED and TCP_SYN_RECV. TCP_NEW_SYN_RECV remains excluded because it still depends on later handshake completion. Testing: - On a local unpatched kernel, the focused migrate_reuseport test fails for the listener-migration cases and passes for the TCP_NEW_SYN_RECV cases: not ok 1 IPv4 TCP_ESTABLISHED inet_csk_listen_stop not ok 2 IPv4 TCP_SYN_RECV inet_csk_listen_stop ok 3 IPv4 TCP_NEW_SYN_RECV reqsk_timer_handler ok 4 IPv4 TCP_NEW_SYN_RECV inet_csk_complete_hashdance not ok 5 IPv6 TCP_ESTABLISHED inet_csk_listen_stop not ok 6 IPv6 TCP_SYN_RECV inet_csk_listen_stop ok 7 IPv6 TCP_NEW_SYN_RECV reqsk_timer_handler ok 8 IPv6 TCP_NEW_SYN_RECV inet_csk_complete_hashdance - On a patched kernel booted under QEMU, the full migrate_reuseport selftest passes: ok 1 IPv4 TCP_ESTABLISHED inet_csk_listen_stop ok 2 IPv4 TCP_SYN_RECV inet_csk_listen_stop ok 3 IPv4 TCP_NEW_SYN_RECV reqsk_timer_handler ok 4 IPv4 TCP_NEW_SYN_RECV inet_csk_complete_hashdance ok 5 IPv6 TCP_ESTABLISHED inet_csk_listen_stop ok 6 IPv6 TCP_SYN_RECV inet_csk_listen_stop ok 7 IPv6 TCP_NEW_SYN_RECV reqsk_timer_handler ok 8 IPv6 TCP_NEW_SYN_RECV inet_csk_complete_hashdance SELFTEST_RC=0 ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2 parents e08a9fa + c01cfc4 commit 0781136

2 files changed

Lines changed: 45 additions & 7 deletions

File tree

net/ipv4/inet_connection_sock.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1479,16 +1479,19 @@ void inet_csk_listen_stop(struct sock *sk)
14791479
if (nreq) {
14801480
refcount_set(&nreq->rsk_refcnt, 1);
14811481

1482+
rcu_read_lock();
14821483
if (inet_csk_reqsk_queue_add(nsk, nreq, child)) {
14831484
__NET_INC_STATS(sock_net(nsk),
14841485
LINUX_MIB_TCPMIGRATEREQSUCCESS);
14851486
reqsk_migrate_reset(req);
1487+
READ_ONCE(nsk->sk_data_ready)(nsk);
14861488
} else {
14871489
__NET_INC_STATS(sock_net(nsk),
14881490
LINUX_MIB_TCPMIGRATEREQFAILURE);
14891491
reqsk_migrate_reset(nreq);
14901492
__reqsk_free(nreq);
14911493
}
1494+
rcu_read_unlock();
14921495

14931496
/* inet_csk_reqsk_queue_add() has already
14941497
* called inet_child_forget() on failure case.

tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c

Lines changed: 42 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,24 +7,29 @@
77
* 3. call listen() for 1 server socket. (migration target)
88
* 4. update a map to migrate all child sockets
99
* to the last server socket (migrate_map[cookie] = 4)
10-
* 5. call shutdown() for first 4 server sockets
10+
* 5. for TCP_ESTABLISHED and TCP_SYN_RECV cases, verify via epoll
11+
* that the last server socket is not ready before migration.
12+
* 6. call shutdown() for first 4 server sockets
1113
* and migrate the requests in the accept queue
1214
* to the last server socket.
13-
* 6. call listen() for the second server socket.
14-
* 7. call shutdown() for the last server
15+
* 7. for TCP_ESTABLISHED and TCP_SYN_RECV cases, verify via epoll
16+
* that the last server socket is ready after migration.
17+
* 8. call listen() for the second server socket.
18+
* 9. call shutdown() for the last server
1519
* and migrate the requests in the accept queue
1620
* to the second server socket.
17-
* 8. call listen() for the last server.
18-
* 9. call shutdown() for the second server
21+
* 10. call listen() for the last server.
22+
* 11. call shutdown() for the second server
1923
* and migrate the requests in the accept queue
2024
* to the last server socket.
21-
* 10. call accept() for the last server socket.
25+
* 12. call accept() for the last server socket.
2226
*
2327
* Author: Kuniyuki Iwashima <[email protected]>
2428
*/
2529

2630
#include <bpf/bpf.h>
2731
#include <bpf/libbpf.h>
32+
#include <sys/epoll.h>
2833

2934
#include "test_progs.h"
3035
#include "test_migrate_reuseport.skel.h"
@@ -350,21 +355,51 @@ static int update_maps(struct migrate_reuseport_test_case *test_case,
350355

351356
static int migrate_dance(struct migrate_reuseport_test_case *test_case)
352357
{
358+
struct epoll_event ev = {
359+
.events = EPOLLIN,
360+
};
361+
int epoll = -1, nfds;
353362
int i, err;
354363

364+
if (test_case->state != BPF_TCP_NEW_SYN_RECV) {
365+
epoll = epoll_create1(0);
366+
if (!ASSERT_NEQ(epoll, -1, "epoll_create1"))
367+
return -1;
368+
369+
ev.data.fd = test_case->servers[MIGRATED_TO];
370+
if (!ASSERT_OK(epoll_ctl(epoll, EPOLL_CTL_ADD,
371+
test_case->servers[MIGRATED_TO], &ev),
372+
"epoll_ctl"))
373+
goto close_epoll;
374+
375+
nfds = epoll_wait(epoll, &ev, 1, 0);
376+
if (!ASSERT_EQ(nfds, 0, "epoll_wait 1"))
377+
goto close_epoll;
378+
}
379+
355380
/* Migrate TCP_ESTABLISHED and TCP_SYN_RECV requests
356381
* to the last listener based on eBPF.
357382
*/
358383
for (i = 0; i < MIGRATED_TO; i++) {
359384
err = shutdown(test_case->servers[i], SHUT_RDWR);
360385
if (!ASSERT_OK(err, "shutdown"))
361-
return -1;
386+
goto close_epoll;
362387
}
363388

364389
/* No dance for TCP_NEW_SYN_RECV to migrate based on eBPF */
365390
if (test_case->state == BPF_TCP_NEW_SYN_RECV)
366391
return 0;
367392

393+
nfds = epoll_wait(epoll, &ev, 1, 0);
394+
if (!ASSERT_EQ(nfds, 1, "epoll_wait 2")) {
395+
close_epoll:
396+
if (epoll >= 0)
397+
close(epoll);
398+
return -1;
399+
}
400+
401+
close(epoll);
402+
368403
/* Note that we use the second listener instead of the
369404
* first one here.
370405
*

0 commit comments

Comments
 (0)