Commit 577a1f4
mm/huge_memory: fix a folio_split() race condition with folio_try_get()
During a pagecache folio split, the values in the related xarray should
not be changed from the original folio at xarray split time until all
after-split folios are well formed and stored in the xarray. Current use
of xas_try_split() in __split_unmapped_folio() lets some after-split
folios show up at wrong indices in the xarray. When these misplaced
after-split folios are unfrozen, before correct folios are stored via
__xa_store(), and grabbed by folio_try_get(), they are returned to
userspace at wrong file indices, causing data corruption. More detailed
explanation is at the bottom.
The reproducer is at: https://github.com/dfinity/thp-madv-remove-test
It
1. creates a memfd,
2. forks,
3. in the child process, maps the file with large folios (via shmem code
path) and reads the mapped file continuously with 16 threads,
4. in the parent process, uses madvise(MADV_REMOVE) to punch poles in the
large folio.
Data corruption can be observed without the fix. Basically, data from a
wrong page->index is returned.
Fix it by using the original folio in xas_try_split() calls, so that
folio_try_get() can get the right after-split folios after the original
folio is unfrozen.
Uniform split, split_huge_page*(), is not affected, since it uses
xas_split_alloc() and xas_split() only once and stores the original folio
in the xarray. Change xas_split() used in uniform split branch to use the
original folio to avoid confusion.
Fixes below points to the commit introduces the code, but folio_split() is
used in a later commit 7460b47 ("mm/truncate: use folio_split() in
truncate operation").
More details:
For example, a folio f is split non-uniformly into f, f2, f3, f4 like
below:
+----------------+---------+----+----+
| f | f2 | f3 | f4 |
+----------------+---------+----+----+
but the xarray would look like below after __split_unmapped_folio() is
done:
+----------------+---------+----+----+
| f | f2 | f3 | f3 |
+----------------+---------+----+----+
After __split_unmapped_folio(), the code changes the xarray and unfreezes
after-split folios:
1. unfreezes f2, __xa_store(f2)
2. unfreezes f3, __xa_store(f3)
3. unfreezes f4, __xa_store(f4), which overwrites the second f3 to f4.
4. unfreezes f.
Meanwhile, a parallel filemap_get_entry() can read the second f3 from the
xarray and use folio_try_get() on it at step 2 when f3 is unfrozen. Then,
f3 is wrongly returned to user.
After the fix, the xarray looks like below after __split_unmapped_folio():
+----------------+---------+----+----+
| f | f | f | f |
+----------------+---------+----+----+
so that the race window no longer exists.
[[email protected]: move comment, per David]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 0052773 ("mm/huge_memory: add two new (not yet used) functions for folio_split()")
Signed-off-by: Zi Yan <[email protected]>
Reported-by: Bas van Dijk <[email protected]>
Closes: https://lore.kernel.org/all/CAKNNEtw5_kZomhkugedKMPOG-sxs5Q5OLumWJdiWXv+C9Yct0w@mail.gmail.com/
Tested-by: Lance Yang <[email protected]>
Reviewed-by: Lorenzo Stoakes <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Reviewed-by: Baolin Wang <[email protected]>
Cc: Barry Song <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Dev Jain <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Liam Howlett <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Nico Pache <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>1 parent 431b04f commit 577a1f4
1 file changed
Lines changed: 9 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3631 | 3631 | | |
3632 | 3632 | | |
3633 | 3633 | | |
| 3634 | + | |
3634 | 3635 | | |
3635 | 3636 | | |
3636 | 3637 | | |
| |||
3651 | 3652 | | |
3652 | 3653 | | |
3653 | 3654 | | |
| 3655 | + | |
| 3656 | + | |
| 3657 | + | |
| 3658 | + | |
3654 | 3659 | | |
3655 | | - | |
3656 | | - | |
3657 | | - | |
| 3660 | + | |
| 3661 | + | |
| 3662 | + | |
3658 | 3663 | | |
3659 | | - | |
| 3664 | + | |
3660 | 3665 | | |
3661 | 3666 | | |
3662 | 3667 | | |
| |||
0 commit comments