Commit 9390808
mm/huge_memory: fix early failure try_to_migrate() when split huge pmd for shared THP
Commit 60fbb14 ("mm/huge_memory: adjust try_to_migrate_one() and
split_huge_pmd_locked()") return false unconditionally after
split_huge_pmd_locked(). This may fail try_to_migrate() early when
TTU_SPLIT_HUGE_PMD is specified.
The reason is the above commit adjusted try_to_migrate_one() to, when a
PMD-mapped THP entry is found, and TTU_SPLIT_HUGE_PMD is specified (for
example, via unmap_folio()), return false unconditionally. This breaks
the rmap walk and fail try_to_migrate() early, if this PMD-mapped THP is
mapped in multiple processes.
The user sensible impact of this bug could be:
* On memory pressure, shrink_folio_list() may split partially mapped
folio with split_folio_to_list(). Then free unmapped pages without IO.
If failed, it may not be reclaimed.
* On memory failure, memory_failure() would call try_to_split_thp_page()
to split folio contains the bad page. If succeed, the PG_has_hwpoisoned
bit is only set in the after-split folio contains @split_at. By doing
so, we limit bad memory. If failed to split, the whole folios is not
usable.
One way to reproduce:
Create an anonymous THP range and fork 512 children, so we have a
THP shared mapped in 513 processes. Then trigger folio split with
/sys/kernel/debug/split_huge_pages debugfs to split the THP folio to
order 0.
Without the above commit, we can successfully split to order 0. With the
above commit, the folio is still a large folio.
And currently there are two core users of TTU_SPLIT_HUGE_PMD:
* try_to_unmap_one()
* try_to_migrate_one()
try_to_unmap_one() would restart the rmap walk, so only
try_to_migrate_one() is affected.
We can't simply revert commit 60fbb14 ("mm/huge_memory: adjust
try_to_migrate_one() and split_huge_pmd_locked()"), since it removed some
duplicated check covered by page_vma_mapped_walk().
This patch fixes this by restart page_vma_mapped_walk() after
split_huge_pmd_locked(). Since we cannot simply return "true" to fix the
problem, as that would affect another case:
When invoking folio_try_share_anon_rmap_pmd() from
split_huge_pmd_locked(), the latter can fail and leave a large folio
mapped through PTEs, in which case we ought to return true from
try_to_migrate_one(). This might result in unnecessary walking of the
rmap but is relatively harmless.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 60fbb14 ("mm/huge_memory: adjust try_to_migrate_one() and split_huge_pmd_locked()")
Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: Baolin Wang <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Tested-by: Lance Yang <[email protected]>
Reviewed-by: Lance Yang <[email protected]>
Reviewed-by: Gavin Guo <[email protected]>
Acked-by: David Hildenbrand (arm) <[email protected]>
Reviewed-by: Lorenzo Stoakes (Oracle) <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>1 parent 29f4059 commit 9390808
1 file changed
Lines changed: 9 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2450 | 2450 | | |
2451 | 2451 | | |
2452 | 2452 | | |
| 2453 | + | |
| 2454 | + | |
| 2455 | + | |
| 2456 | + | |
| 2457 | + | |
| 2458 | + | |
2453 | 2459 | | |
2454 | 2460 | | |
2455 | | - | |
2456 | | - | |
2457 | | - | |
| 2461 | + | |
| 2462 | + | |
| 2463 | + | |
2458 | 2464 | | |
2459 | 2465 | | |
2460 | 2466 | | |
| |||
0 commit comments