Skip to content

Commit db25c42

Browse files
dtatuleakuba-moo
authored andcommitted
net/mlx5e: RX, Fix XDP multi-buf frag counting for striding RQ
XDP multi-buf programs can modify the layout of the XDP buffer when the program calls bpf_xdp_pull_data() or bpf_xdp_adjust_tail(). The referenced commit in the fixes tag corrected the assumption in the mlx5 driver that the XDP buffer layout doesn't change during a program execution. However, this fix introduced another issue: the dropped fragments still need to be counted on the driver side to avoid page fragment reference counting issues. The issue was discovered by the drivers/net/xdp.py selftest, more specifically the test_xdp_native_tx_mb: - The mlx5 driver allocates a page_pool page and initializes it with a frag counter of 64 (pp_ref_count=64) and the internal frag counter to 0. - The test sends one packet with no payload. - On RX (mlx5e_skb_from_cqe_mpwrq_nonlinear()), mlx5 configures the XDP buffer with the packet data starting in the first fragment which is the page mentioned above. - The XDP program runs and calls bpf_xdp_pull_data() which moves the header into the linear part of the XDP buffer. As the packet doesn't contain more data, the program drops the tail fragment since it no longer contains any payload (pp_ref_count=63). - mlx5 device skips counting this fragment. Internal frag counter remains 0. - mlx5 releases all 64 fragments of the page but page pp_ref_count is 63 => negative reference counting error. Resulting splat during the test: WARNING: CPU: 0 PID: 188225 at ./include/net/page_pool/helpers.h:297 mlx5e_page_release_fragmented.isra.0+0xbd/0xe0 [mlx5_core] Modules linked in: [...] CPU: 0 UID: 0 PID: 188225 Comm: ip Not tainted 6.18.0-rc7_for_upstream_min_debug_2025_12_08_11_44 #1 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5e_page_release_fragmented.isra.0+0xbd/0xe0 [mlx5_core] [...] Call Trace: <TASK> mlx5e_free_rx_mpwqe+0x20a/0x250 [mlx5_core] mlx5e_dealloc_rx_mpwqe+0x37/0xb0 [mlx5_core] mlx5e_free_rx_descs+0x11a/0x170 [mlx5_core] mlx5e_close_rq+0x78/0xa0 [mlx5_core] mlx5e_close_queues+0x46/0x2a0 [mlx5_core] mlx5e_close_channel+0x24/0x90 [mlx5_core] mlx5e_close_channels+0x5d/0xf0 [mlx5_core] mlx5e_safe_switch_params+0x2ec/0x380 [mlx5_core] mlx5e_change_mtu+0x11d/0x490 [mlx5_core] mlx5e_change_nic_mtu+0x19/0x30 [mlx5_core] netif_set_mtu_ext+0xfc/0x240 do_setlink.isra.0+0x226/0x1100 rtnl_newlink+0x7a9/0xba0 rtnetlink_rcv_msg+0x220/0x3c0 netlink_rcv_skb+0x4b/0xf0 netlink_unicast+0x255/0x380 netlink_sendmsg+0x1f3/0x420 __sock_sendmsg+0x38/0x60 ____sys_sendmsg+0x1e8/0x240 ___sys_sendmsg+0x7c/0xb0 [...] __sys_sendmsg+0x5f/0xb0 do_syscall_64+0x55/0xc70 The problem applies for XDP_PASS as well which is handled in a different code path in the driver. This patch fixes the issue by doing page frag counting on all the original XDP buffer fragments for all relevant XDP actions (XDP_TX , XDP_REDIRECT and XDP_PASS). This is basically reverting to the original counting before the commit in the fixes tag. As frag_page is still pointing to the original tail, the nr_frags parameter to xdp_update_skb_frags_info() needs to be calculated in a different way to reflect the new nr_frags. Fixes: 87bcef1 ("net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for striding RQ") Signed-off-by: Dragos Tatulea <[email protected]> Cc: Amery Hung <[email protected]> Reviewed-by: Nimrod Oren <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
1 parent 1633111 commit db25c42

1 file changed

Lines changed: 5 additions & 7 deletions

File tree

  • drivers/net/ethernet/mellanox/mlx5/core

drivers/net/ethernet/mellanox/mlx5/core/en_rx.c

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1957,14 +1957,13 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
19571957

19581958
if (prog) {
19591959
u8 nr_frags_free, old_nr_frags = sinfo->nr_frags;
1960+
u8 new_nr_frags;
19601961
u32 len;
19611962

19621963
if (mlx5e_xdp_handle(rq, prog, mxbuf)) {
19631964
if (__test_and_clear_bit(MLX5E_RQ_FLAG_XDP_XMIT, rq->flags)) {
19641965
struct mlx5e_frag_page *pfp;
19651966

1966-
frag_page -= old_nr_frags - sinfo->nr_frags;
1967-
19681967
for (pfp = head_page; pfp < frag_page; pfp++)
19691968
pfp->frags++;
19701969

@@ -1975,13 +1974,12 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
19751974
return NULL; /* page/packet was consumed by XDP */
19761975
}
19771976

1978-
nr_frags_free = old_nr_frags - sinfo->nr_frags;
1979-
if (unlikely(nr_frags_free)) {
1980-
frag_page -= nr_frags_free;
1977+
new_nr_frags = sinfo->nr_frags;
1978+
nr_frags_free = old_nr_frags - new_nr_frags;
1979+
if (unlikely(nr_frags_free))
19811980
truesize -= (nr_frags_free - 1) * PAGE_SIZE +
19821981
ALIGN(pg_consumed_bytes,
19831982
BIT(rq->mpwqe.log_stride_sz));
1984-
}
19851983

19861984
len = mxbuf->xdp.data_end - mxbuf->xdp.data;
19871985

@@ -2003,7 +2001,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
20032001
struct mlx5e_frag_page *pagep;
20042002

20052003
/* sinfo->nr_frags is reset by build_skb, calculate again. */
2006-
xdp_update_skb_frags_info(skb, frag_page - head_page,
2004+
xdp_update_skb_frags_info(skb, new_nr_frags,
20072005
sinfo->xdp_frags_size,
20082006
truesize,
20092007
xdp_buff_get_skb_flags(&mxbuf->xdp));

0 commit comments

Comments
 (0)