Skip to content

Commit b854e1c

Browse files
Long Licmaiolino
authored andcommitted
xfs: close crash window in attr dabtree inactivation
When inactivating an inode with node-format extended attributes, xfs_attr3_node_inactive() invalidates all child leaf/node blocks via xfs_trans_binval(), but intentionally does not remove the corresponding entries from their parent node blocks. The implicit assumption is that xfs_attr_inactive() will truncate the entire attr fork to zero extents afterwards, so log recovery will never reach the root node and follow those stale pointers. However, if a log shutdown occurs after the leaf/node block cancellations commit but before the attr bmap truncation commits, this assumption breaks. Recovery replays the attr bmap intact (the inode still has attr fork extents), but suppresses replay of all cancelled leaf/node blocks, maybe leaving them as stale data on disk. On the next mount, xlog_recover_process_iunlinks() retries inactivation and attempts to read the root node via the attr bmap. If the root node was not replayed, reading the unreplayed root block triggers a metadata verification failure immediately; if it was replayed, following its child pointers to unreplayed child blocks triggers the same failure: XFS (pmem0): Metadata corruption detected at xfs_da3_node_read_verify+0x53/0x220, xfs_da3_node block 0x78 XFS (pmem0): Unmount and run xfs_repair XFS (pmem0): First 128 bytes of corrupted metadata buffer: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ XFS (pmem0): metadata I/O error in "xfs_da_read_buf+0x104/0x190" at daddr 0x78 len 8 error 117 Fix this in two places: In xfs_attr3_node_inactive(), after calling xfs_trans_binval() on a child block, immediately remove the entry that references it from the parent node in the same transaction. This eliminates the window where the parent holds a pointer to a cancelled block. Once all children are removed, the now-empty root node is converted to a leaf block within the same transaction. This node-to-leaf conversion is necessary for crash safety. If the system shutdown after the empty node is written to the log but before the second-phase bmap truncation commits, log recovery will attempt to verify the root block on disk. xfs_da3_node_verify() does not permit a node block with count == 0; such a block will fail verification and trigger a metadata corruption shutdown. on the other hand, leaf blocks are allowed to have this transient state. In xfs_attr_inactive(), split the attr fork truncation into two explicit phases. First, truncate all extents beyond the root block (the child extents whose parent references have already been removed above). Second, invalidate the root block and truncate the attr bmap to zero in a single transaction. The two operations in the second phase must be atomic: as long as the attr bmap has any non-zero length, recovery can follow it to the root block, so the root block invalidation must commit together with the bmap-to-zero truncation. Fixes: 1da177e ("Linux-2.6.12-rc2") Cc: [email protected] Signed-off-by: Long Li <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Carlos Maiolino <[email protected]>
1 parent e65bb55 commit b854e1c

1 file changed

Lines changed: 57 additions & 38 deletions

File tree

fs/xfs/xfs_attr_inactive.c

Lines changed: 57 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ xfs_attr3_node_inactive(
140140
xfs_daddr_t parent_blkno, child_blkno;
141141
struct xfs_buf *child_bp;
142142
struct xfs_da3_icnode_hdr ichdr;
143-
int error, i;
143+
int error;
144144

145145
/*
146146
* Since this code is recursive (gasp!) we must protect ourselves.
@@ -152,7 +152,7 @@ xfs_attr3_node_inactive(
152152
return -EFSCORRUPTED;
153153
}
154154

155-
xfs_da3_node_hdr_from_disk(dp->i_mount, &ichdr, bp->b_addr);
155+
xfs_da3_node_hdr_from_disk(mp, &ichdr, bp->b_addr);
156156
parent_blkno = xfs_buf_daddr(bp);
157157
if (!ichdr.count) {
158158
xfs_trans_brelse(*trans, bp);
@@ -167,7 +167,7 @@ xfs_attr3_node_inactive(
167167
* over the leaves removing all of them. If this is higher up
168168
* in the tree, recurse downward.
169169
*/
170-
for (i = 0; i < ichdr.count; i++) {
170+
while (ichdr.count > 0) {
171171
/*
172172
* Read the subsidiary block to see what we have to work with.
173173
* Don't do this in a transaction. This is a depth-first
@@ -218,29 +218,32 @@ xfs_attr3_node_inactive(
218218
xfs_trans_binval(*trans, child_bp);
219219
child_bp = NULL;
220220

221+
error = xfs_da3_node_read_mapped(*trans, dp,
222+
parent_blkno, &bp, XFS_ATTR_FORK);
223+
if (error)
224+
return error;
225+
221226
/*
222-
* If we're not done, re-read the parent to get the next
223-
* child block number.
227+
* Remove entry from parent node, prevents being indexed to.
224228
*/
225-
if (i + 1 < ichdr.count) {
226-
struct xfs_da3_icnode_hdr phdr;
229+
xfs_attr3_node_entry_remove(*trans, dp, bp, 0);
230+
231+
xfs_da3_node_hdr_from_disk(mp, &ichdr, bp->b_addr);
232+
bp = NULL;
227233

228-
error = xfs_da3_node_read_mapped(*trans, dp,
229-
parent_blkno, &bp, XFS_ATTR_FORK);
234+
if (ichdr.count > 0) {
235+
/*
236+
* If we're not done, get the next child block number.
237+
*/
238+
child_fsb = be32_to_cpu(ichdr.btree[0].before);
239+
240+
/*
241+
* Atomically commit the whole invalidate stuff.
242+
*/
243+
error = xfs_trans_roll_inode(trans, dp);
230244
if (error)
231245
return error;
232-
xfs_da3_node_hdr_from_disk(dp->i_mount, &phdr,
233-
bp->b_addr);
234-
child_fsb = be32_to_cpu(phdr.btree[i + 1].before);
235-
xfs_trans_brelse(*trans, bp);
236-
bp = NULL;
237246
}
238-
/*
239-
* Atomically commit the whole invalidate stuff.
240-
*/
241-
error = xfs_trans_roll_inode(trans, dp);
242-
if (error)
243-
return error;
244247
}
245248

246249
return 0;
@@ -257,10 +260,8 @@ xfs_attr3_root_inactive(
257260
struct xfs_trans **trans,
258261
struct xfs_inode *dp)
259262
{
260-
struct xfs_mount *mp = dp->i_mount;
261263
struct xfs_da_blkinfo *info;
262264
struct xfs_buf *bp;
263-
xfs_daddr_t blkno;
264265
int error;
265266

266267
/*
@@ -272,7 +273,6 @@ xfs_attr3_root_inactive(
272273
error = xfs_da3_node_read(*trans, dp, 0, &bp, XFS_ATTR_FORK);
273274
if (error)
274275
return error;
275-
blkno = xfs_buf_daddr(bp);
276276

277277
/*
278278
* Invalidate the tree, even if the "tree" is only a single leaf block.
@@ -283,10 +283,26 @@ xfs_attr3_root_inactive(
283283
case cpu_to_be16(XFS_DA_NODE_MAGIC):
284284
case cpu_to_be16(XFS_DA3_NODE_MAGIC):
285285
error = xfs_attr3_node_inactive(trans, dp, bp, 1);
286+
/*
287+
* Empty root node block are not allowed, convert it to leaf.
288+
*/
289+
if (!error)
290+
error = xfs_attr3_leaf_init(*trans, dp, 0);
291+
if (!error)
292+
error = xfs_trans_roll_inode(trans, dp);
286293
break;
287294
case cpu_to_be16(XFS_ATTR_LEAF_MAGIC):
288295
case cpu_to_be16(XFS_ATTR3_LEAF_MAGIC):
289296
error = xfs_attr3_leaf_inactive(trans, dp, bp);
297+
/*
298+
* Reinit the leaf before truncating extents so that a crash
299+
* mid-truncation leaves an empty leaf rather than one with
300+
* entries that may reference freed remote value blocks.
301+
*/
302+
if (!error)
303+
error = xfs_attr3_leaf_init(*trans, dp, 0);
304+
if (!error)
305+
error = xfs_trans_roll_inode(trans, dp);
290306
break;
291307
default:
292308
xfs_dirattr_mark_sick(dp, XFS_ATTR_FORK);
@@ -295,21 +311,6 @@ xfs_attr3_root_inactive(
295311
xfs_trans_brelse(*trans, bp);
296312
break;
297313
}
298-
if (error)
299-
return error;
300-
301-
/*
302-
* Invalidate the incore copy of the root block.
303-
*/
304-
error = xfs_trans_get_buf(*trans, mp->m_ddev_targp, blkno,
305-
XFS_FSB_TO_BB(mp, mp->m_attr_geo->fsbcount), 0, &bp);
306-
if (error)
307-
return error;
308-
xfs_trans_binval(*trans, bp); /* remove from cache */
309-
/*
310-
* Commit the invalidate and start the next transaction.
311-
*/
312-
error = xfs_trans_roll_inode(trans, dp);
313314

314315
return error;
315316
}
@@ -328,6 +329,7 @@ xfs_attr_inactive(
328329
{
329330
struct xfs_trans *trans;
330331
struct xfs_mount *mp;
332+
struct xfs_buf *bp;
331333
int lock_mode = XFS_ILOCK_SHARED;
332334
int error = 0;
333335

@@ -363,10 +365,27 @@ xfs_attr_inactive(
363365
* removal below.
364366
*/
365367
if (dp->i_af.if_nextents > 0) {
368+
/*
369+
* Invalidate and truncate all blocks but leave the root block.
370+
*/
366371
error = xfs_attr3_root_inactive(&trans, dp);
367372
if (error)
368373
goto out_cancel;
369374

375+
error = xfs_itruncate_extents(&trans, dp, XFS_ATTR_FORK,
376+
XFS_FSB_TO_B(mp, mp->m_attr_geo->fsbcount));
377+
if (error)
378+
goto out_cancel;
379+
380+
/*
381+
* Invalidate and truncate the root block and ensure that the
382+
* operation is completed within a single transaction.
383+
*/
384+
error = xfs_da_get_buf(trans, dp, 0, &bp, XFS_ATTR_FORK);
385+
if (error)
386+
goto out_cancel;
387+
388+
xfs_trans_binval(trans, bp);
370389
error = xfs_itruncate_extents(&trans, dp, XFS_ATTR_FORK, 0);
371390
if (error)
372391
goto out_cancel;

0 commit comments

Comments
 (0)