Skip to content

mm/zsmalloc: reduce zs_free() latency on swap release path#757

Open
blktests-ci[bot] wants to merge 4 commits intolinus-master_basefrom
series/1083830=>linus-master
Open

mm/zsmalloc: reduce zs_free() latency on swap release path#757
blktests-ci[bot] wants to merge 4 commits intolinus-master_basefrom
series/1083830=>linus-master

Conversation

@blktests-ci
Copy link
Copy Markdown

@blktests-ci blktests-ci Bot commented Apr 21, 2026

Pull request for series with
subject: mm/zsmalloc: reduce zs_free() latency on swap release path
version: 2
url: https://patchwork.kernel.org/project/linux-block/list/?series=1083830

@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented Apr 21, 2026

Upstream branch: b4e0758
series: https://patchwork.kernel.org/project/linux-block/list/?series=1083830
version: 2

@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented Apr 22, 2026

Upstream branch: 6596a02
series: https://patchwork.kernel.org/project/linux-block/list/?series=1083830
version: 2

@blktests-ci blktests-ci Bot force-pushed the series/1083830=>linus-master branch from 390415c to 856edea Compare April 22, 2026 20:23
@blktests-ci blktests-ci Bot force-pushed the linus-master_base branch from 3b54e52 to 6a0b974 Compare April 23, 2026 16:58
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented Apr 23, 2026

Upstream branch: 507bd4b
series: https://patchwork.kernel.org/project/linux-block/list/?series=1083830
version: 2

@blktests-ci blktests-ci Bot force-pushed the series/1083830=>linus-master branch from 856edea to 9acd7af Compare April 23, 2026 17:01
@blktests-ci blktests-ci Bot force-pushed the linus-master_base branch from 6a0b974 to 59ca59b Compare April 24, 2026 00:56
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented Apr 24, 2026

Upstream branch: dd6c438
series: https://patchwork.kernel.org/project/linux-block/list/?series=1083830
version: 2

@blktests-ci blktests-ci Bot force-pushed the series/1083830=>linus-master branch from 9acd7af to 0218f85 Compare April 24, 2026 00:58
@blktests-ci blktests-ci Bot force-pushed the linus-master_base branch from 59ca59b to 94f0438 Compare April 24, 2026 07:53
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented Apr 24, 2026

Upstream branch: dd6c438
series: https://patchwork.kernel.org/project/linux-block/list/?series=1083830
version: 2

@blktests-ci blktests-ci Bot force-pushed the series/1083830=>linus-master branch from 0218f85 to e65a831 Compare April 24, 2026 07:53
@blktests-ci blktests-ci Bot force-pushed the linus-master_base branch from 94f0438 to 857ada9 Compare April 24, 2026 07:54
chen210 and others added 4 commits April 24, 2026 17:05
Currently in zs_free(), the class->lock is held until the zspage is
completely freed and the counters are updated. However, freeing pages back
to the buddy allocator requires acquiring the zone lock.

Under heavy memory pressure, zone lock contention can be severe. When this
happens, the CPU holding the class->lock will stall waiting for the zone
lock, thereby blocking all other CPUs attempting to acquire the same
class->lock.

This patch shrinks the critical section of the class->lock to reduce lock
contention. By moving the actual page freeing process outside the
class->lock, we can improve the concurrency performance of zs_free().

Testing on the RADXA O6 platform shows that with 12 CPUs concurrently
performing zs_free() operations, the execution time is reduced by 20%.

Signed-off-by: Xueyuan Chen <[email protected]>
Signed-off-by: Wenchao Hao <[email protected]>
zs_free() is expensive due to internal locking (pool->lock, class->lock)
and potential zspage freeing. On the process exit path, the slow
zs_free() blocks memory reclamation, delaying overall memory release.
This has been reported to significantly impact Android low-memory
killing where slot_free() accounts for over 80% of the total swap
entry freeing cost.

Introduce zs_free_deferred() which queues handles into a fixed-size
per-pool array for later processing by a workqueue. This allows callers
to defer the expensive zs_free() and return quickly, so the process
exit path can release memory faster. The array capacity is derived from
a 128MB uncompressed data budget (128MB >> PAGE_SHIFT entries), which
scales naturally with PAGE_SIZE. When the array reaches half capacity,
the workqueue is scheduled to drain pending handles.

zs_free_deferred() uses spin_trylock() to access the deferred queue.
If the lock is contended (e.g. drain in progress) or the queue is full,
it falls back to synchronous zs_free() to guarantee correctness.

Also introduce zs_free_deferred_flush() for use during pool teardown to
ensure all pending handles are freed.

Signed-off-by: Wenchao Hao <[email protected]>
zram_slot_free_notify() is called on the process exit path when
unmapping swap entries. The slot_free() it calls internally invokes
zs_free(), which accounts for ~87% of slot_free() cost due to zsmalloc
internal locking (pool->lock, class->lock) and potential zspage freeing.
This blocks the process exit path, delaying overall memory release
during Android low-memory killing.

Split slot_free() into slot_free_extract() and the actual zs_free()
call. slot_free_extract() handles all slot metadata cleanup (clearing
flags, updating stats, zeroing handle/size) and returns the zsmalloc
handle that needs freeing. This separation has two benefits:

1. It makes the two responsibilities of slot_free() explicit: slot
   metadata management (must be done under slot lock) vs zsmalloc
   memory release (can be deferred).

2. It allows zram_slot_free_notify() to use zs_free_deferred() for
   the handle, deferring the expensive zs_free() to a workqueue so
   the exit path can release memory faster.

While at it, merge three separate clear_slot_flag() calls for
ZRAM_IDLE, ZRAM_INCOMPRESSIBLE, and ZRAM_PP_SLOT into a single
bitmask operation via clear_slot_flags_on_free(), reducing redundant
read-modify-write cycles on the same flags word.

All other slot_free() callers (write, discard, meta_free) continue
to use synchronous zs_free() through the unchanged slot_free()
wrapper.

Signed-off-by: Barry Song (Xiaomi) <[email protected]>
Signed-off-by: Wenchao Hao <[email protected]>
zswap_invalidate() is called on the same process exit path as
zram_slot_free_notify(). The zswap_entry_free() it calls internally
performs zs_free() which is expensive due to zsmalloc internal locking.
Unlike zram which has a trylock fallback, zswap_invalidate() executes
unconditionally, making the latency impact potentially worse.

Like zram, the expensive zs_free() here blocks the process exit path,
delaying overall memory release. Additionally, zswap_entry_free()
performs extra work beyond zs_free(): list_lru_del() (takes its own
spinlock), obj_cgroup accounting, and kmem_cache_free for the entry
itself.

Use zs_free_deferred() in zswap_invalidate() path to defer the
expensive zsmalloc handle freeing to a workqueue, allowing the exit
path to release memory faster. All other callers (zswap_load,
zswap_writeback_entry, zswap_store error paths) run in process context
and continue to use synchronous zs_free().

Signed-off-by: Wenchao Hao <[email protected]>
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented Apr 24, 2026

Upstream branch: dd6c438
series: https://patchwork.kernel.org/project/linux-block/list/?series=1083830
version: 2

@blktests-ci blktests-ci Bot force-pushed the series/1083830=>linus-master branch from e65a831 to 6957651 Compare April 24, 2026 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants