Skip to content

Commit 577a1f4

Browse files
x-y-zakpm00
authored andcommitted
mm/huge_memory: fix a folio_split() race condition with folio_try_get()
During a pagecache folio split, the values in the related xarray should not be changed from the original folio at xarray split time until all after-split folios are well formed and stored in the xarray. Current use of xas_try_split() in __split_unmapped_folio() lets some after-split folios show up at wrong indices in the xarray. When these misplaced after-split folios are unfrozen, before correct folios are stored via __xa_store(), and grabbed by folio_try_get(), they are returned to userspace at wrong file indices, causing data corruption. More detailed explanation is at the bottom. The reproducer is at: https://github.com/dfinity/thp-madv-remove-test It 1. creates a memfd, 2. forks, 3. in the child process, maps the file with large folios (via shmem code path) and reads the mapped file continuously with 16 threads, 4. in the parent process, uses madvise(MADV_REMOVE) to punch poles in the large folio. Data corruption can be observed without the fix. Basically, data from a wrong page->index is returned. Fix it by using the original folio in xas_try_split() calls, so that folio_try_get() can get the right after-split folios after the original folio is unfrozen. Uniform split, split_huge_page*(), is not affected, since it uses xas_split_alloc() and xas_split() only once and stores the original folio in the xarray. Change xas_split() used in uniform split branch to use the original folio to avoid confusion. Fixes below points to the commit introduces the code, but folio_split() is used in a later commit 7460b47 ("mm/truncate: use folio_split() in truncate operation"). More details: For example, a folio f is split non-uniformly into f, f2, f3, f4 like below: +----------------+---------+----+----+ | f | f2 | f3 | f4 | +----------------+---------+----+----+ but the xarray would look like below after __split_unmapped_folio() is done: +----------------+---------+----+----+ | f | f2 | f3 | f3 | +----------------+---------+----+----+ After __split_unmapped_folio(), the code changes the xarray and unfreezes after-split folios: 1. unfreezes f2, __xa_store(f2) 2. unfreezes f3, __xa_store(f3) 3. unfreezes f4, __xa_store(f4), which overwrites the second f3 to f4. 4. unfreezes f. Meanwhile, a parallel filemap_get_entry() can read the second f3 from the xarray and use folio_try_get() on it at step 2 when f3 is unfrozen. Then, f3 is wrongly returned to user. After the fix, the xarray looks like below after __split_unmapped_folio(): +----------------+---------+----+----+ | f | f | f | f | +----------------+---------+----+----+ so that the race window no longer exists. [[email protected]: move comment, per David] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 0052773 ("mm/huge_memory: add two new (not yet used) functions for folio_split()") Signed-off-by: Zi Yan <[email protected]> Reported-by: Bas van Dijk <[email protected]> Closes: https://lore.kernel.org/all/CAKNNEtw5_kZomhkugedKMPOG-sxs5Q5OLumWJdiWXv+C9Yct0w@mail.gmail.com/ Tested-by: Lance Yang <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Wei Yang <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Cc: Barry Song <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dev Jain <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Nico Pache <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent 431b04f commit 577a1f4

1 file changed

Lines changed: 9 additions & 4 deletions

File tree

mm/huge_memory.c

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3631,6 +3631,7 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
36313631
const bool is_anon = folio_test_anon(folio);
36323632
int old_order = folio_order(folio);
36333633
int start_order = split_type == SPLIT_TYPE_UNIFORM ? new_order : old_order - 1;
3634+
struct folio *old_folio = folio;
36343635
int split_order;
36353636

36363637
/*
@@ -3651,12 +3652,16 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
36513652
* uniform split has xas_split_alloc() called before
36523653
* irq is disabled to allocate enough memory, whereas
36533654
* non-uniform split can handle ENOMEM.
3655+
* Use the to-be-split folio, so that a parallel
3656+
* folio_try_get() waits on it until xarray is updated
3657+
* with after-split folios and the original one is
3658+
* unfrozen.
36543659
*/
3655-
if (split_type == SPLIT_TYPE_UNIFORM)
3656-
xas_split(xas, folio, old_order);
3657-
else {
3660+
if (split_type == SPLIT_TYPE_UNIFORM) {
3661+
xas_split(xas, old_folio, old_order);
3662+
} else {
36583663
xas_set_order(xas, folio->index, split_order);
3659-
xas_try_split(xas, folio, old_order);
3664+
xas_try_split(xas, old_folio, old_order);
36603665
if (xas_error(xas))
36613666
return xas_error(xas);
36623667
}

0 commit comments

Comments
 (0)