XFS filesystem shutdown when deleting millions of files on RHEL

XFS filesystem shutdown when deleting millions of files on RHEL

 SOLUTION UNVERIFIED - 已更新 2019年三月8日20:43 - 

English 

環境

  • Red Hat Enterprise Linux 7

問題

  • XFS file system shut down when deleting 50 ~ 60 million small files (5 ~ 10 KB).

Raw

kernel: XFS (dm-3): xlog_write: reservation summary:
  trans type  = INACTIVE (3)
  unit res    = 78564 bytes
  current res = -33148 bytes
  total reg   = 0 bytes (o/flow = 0 bytes)
  ophdrs      = 0 (ophdr space = 0 bytes)
  ophdr + reg = 0 bytes
  num regions = 0
kernel: XFS : xlog_write: reservation ran out. Need to up reservation
kernel: XFS : xfs_do_force_shutdown(0x2) called from line 2042 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffa033edb8
kernel: XFS : xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
kernel: XFS : Log I/O Error Detected.  Shutting down filesystem
kernel: XFS : Please umount the filesystem and rectify the problem(s)
kernel: XFS : xfs_log_force: error 5 returned.

決議

This issue has been resolved by RHSA-2018:1062 please upgrade to kernel-3.10.0-862.el7 (7.5) or later for this fix.

Workaround

If you are unable to upgrade to the latest kernel some customers have reported success mounting the filesystem with the 'ikeep' mount option which prevents XFS from freeing unused inode clusters.

根源

This series refactors some of the transaction overrun reporting code in XFS to provide more useful error reports (patches 1-4) and refactors some of the log reservations associated with inode transactions to provide more accurate worst case reservations. This helps avoid transaction reservation overruns during particular workloads.

Below commits have been backported.

Raw

  xfs: separate shutdown from ticket reservation print helper
  xfs: refactor xlog_cil_insert_items() to facilitate transaction dump
  xfs: dump transaction usage details on log reservation overrun
  xfs: print transaction log reservation on overrun
  xfs: include inobt buffers in ifree tx log reservation
  xfs: fix up agi unlinked list reservations
  xfs: truncate transaction does not modify the inobt
  xfs: include an allocfree res for inobt modifications
  xfs: refactor inode chunk alloc/free tx reservation
  xfs: eliminate duplicate icreate tx reservation functions

診斷步驟

A metadump was taken of a filesystem exhibiting this issue for analysis.

Raw

# xfs_metadump -gow /dev/mapper/vg_data | bzip2 > reservation_failure.xfsmetadump.bz2 

The image was restored to a local volume group for testing.
kdump was configured and the system set to panic on log reservation failures.

Raw

# sysctl fs.xfs.error_level=11 fs.xfs.panic_mask=2
fs.xfs.error_level = 11
fs.xfs.panic_mask = 2
# bzcat ~/xfs-metadump.img.gz | xfs_mdrestore - /dev/mapper/vg_metadump
# mount /dev/mapper/vg_metadump /mnt/metadump
# find /mnt/metadump -type f -delete

Initial analysis of the vmcore

Raw


XFS (dm-2): xlog_write: reservation summary:
XFS (dm-2):   trans type  = INACTIVE (3)
XFS (dm-2):   unit res    = 47832 bytes
XFS (dm-2):   current res = -128 bytes
XFS (dm-2):   total reg   = 0 bytes (o/flow = 0 bytes)
XFS (dm-2):   ophdrs      = 0 (ophdr space = 0 bytes)
XFS (dm-2):   ophdr + reg = 0 bytes
XFS (dm-2):   num regions = 0
XFS (dm-2): Transforming an alert into a BUG.
XFS (dm-2): xlog_write: reservation ran out. Need to up reservation
------------[ cut here ]------------
kernel BUG at fs/xfs/xfs_message.c:97!
invalid opcode: 0000 [#1] SMP
Modules linked in: intel_powerclamp ... xfs ...
CPU: 0 PID: 29036 Comm: find Not tainted 3.10.0-514.el7.x86_64 #1
Hardware name: Dell Inc. PowerEdge R530/0CN7X8, BIOS 2.4.2 01/09/2017
task: ffff88085717af10 ti: ffff8806a1f50000 task.ti: ffff8806a1f50000
RIP: 0010:[<ffffffffa033c40c>]  [<ffffffffa033c40c>] xfs_alert_tag+0xec/0x100 [xfs]
RSP: 0018:ffff8806a1f53c08  EFLAGS: 00010246
RAX: 0000000000000043 RBX: ffff880858970000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88085c60f838 RDI: ffff88085c60f838
RBP: ffff8806a1f53c78 R08: 0000000000000086 R09: 00000000000005d4
R10: 00000000000003ff R11: 0000000000000001 R12: ffff88066531cfd0
R13: ffff880858970000 R14: ffff880471dba108 R15: ffff880259574d68
FS:  00007f3fe617a800(0000) GS:ffff88085c600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000840018 CR3: 0000000581fe9000 CR4: 00000000003407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 ffffffffa0372da8 ffffffffa0372da8 ffff8806a1f53c20 ffff880600000018
 ffff8806a1f53c88 ffff8806a1f53c40 00000000668a0908 00000000668a0908
 ffff8802902afcb8 ffff8801bc792f80 0000000000000000 0000000000000000
Call Trace:
 [<ffffffffa0347f8f>] xlog_print_tic_res+0x14f/0x170 [xfs]
 [<ffffffffa034a629>] xfs_log_commit_cil+0x499/0x4d0 [xfs]
 [<ffffffffa034422d>] __xfs_trans_commit+0x12d/0x260 [xfs]
 [<ffffffffa0344610>] xfs_trans_commit+0x10/0x20 [xfs]
 [<ffffffffa0339bd4>] xfs_inactive_ifree+0x1d4/0x240 [xfs]
 [<ffffffffa0339ccd>] xfs_inactive+0x8d/0x130 [xfs]
 [<ffffffffa033f346>] xfs_fs_evict_inode+0xa6/0xe0 [xfs]
 [<ffffffff8121a0e7>] evict+0xa7/0x170
 [<ffffffff8121a985>] iput+0xf5/0x180
 [<ffffffff8120f31e>] do_unlinkat+0x1ae/0x2b0
 [<ffffffff812037f4>] ? SYSC_newfstatat+0x34/0x60
 [<ffffffff812102eb>] SyS_unlinkat+0x1b/0x40
 [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b

crash> whatis xlog_print_tic_res
void xlog_print_tic_res(struct xfs_mount *, struct xlog_ticket *);
crash> dis -r ffffffffa0347f8f
0xffffffffa0347e40 <xlog_print_tic_res>:    nopl   0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffa0347e45 <xlog_print_tic_res+0x5>:    push   %rbp
0xffffffffa0347e46 <xlog_print_tic_res+0x6>:    mov    %rsp,%rbp
0xffffffffa0347e49 <xlog_print_tic_res+0x9>:    push   %r13
0xffffffffa0347e4b <xlog_print_tic_res+0xb>:    mov    %rdi,%r13
0xffffffffa0347e4e <xlog_print_tic_res+0xe>:    push   %r12
0xffffffffa0347e50 <xlog_print_tic_res+0x10>:   mov    %rsi,%r12
... r12 and r13 aren't changed later

crash> struct xlog_ticket ffff88066531cfd0
struct xlog_ticket {
  t_queue = {
    next = 0xffff88066531cfd0, 
    prev = 0xffff88066531cfd0
  }, 
  t_task = 0xffff88085717af10, 
  t_tid = 0x8fb68988, 
  t_ref = {
    counter = 0x1
  }, 
  t_curr_res = 0xffffff80,  (-128) 
  t_unit_res = 0xbad8,      (47832)
  t_ocnt = 0x2, 
  t_cnt = 0x1, 
  t_clientid = 0x69, 
  t_flags = 0x3, 
  t_trans_type = 0x3, 
  t_res_num = 0x0, 
  t_res_num_ophdrs = 0x0, 
  t_res_arr_sum = 0x0, 
  t_res_o_flow = 0x0, 
  t_res_arr = {{
      r_len = 0x0, 
      r_type = 0x0
    }, {
      r_len = 0x0, 
      r_type = 0x0
    }, {
      r_len = 0x0, 
      r_type = 0x0
    }, {
      r_len = 0x0, 
      r_type = 0x0
    }, {
      r_len = 0x0, 
      r_type = 0x0
    }, {
      r_len = 0x0, 
      r_type = 0x0
    }, {
      r_len = 0x0, 
      r_type = 0x0
    }, {
      r_len = 0x0, 
      r_type = 0x0
    }, {
      r_len = 0x0, 
      r_type = 0x0
    }, {
      r_len = 0x0, 
      r_type = 0x0
    }, {
      r_len = 0x0, 
      r_type = 0x0
    }, {
      r_len = 0x0, 
      r_type = 0x0
    }, {
      r_len = 0x0, 
      r_type = 0x0
    }, {
      r_len = 0x0, 
      r_type = 0x0
    }, {
      r_len = 0x0, 
      r_type = 0x0
    }}
}
  • Related source code:

Raw

https://access.redhat.com/labs/psb/versions/kernel-3.10.0-229.el7/fs/xfs/xfs_log.c

void
xlog_print_tic_res(
        struct xfs_mount        *mp,
        struct xlog_ticket        *ticket)
{
        uint i;
        uint ophdr_spc = ticket->t_res_num_ophdrs * (uint)sizeof(xlog_op_header_t);

        /* match with XLOG_REG_TYPE_* in xfs_log.h */
        static char *res_type_str[XLOG_REG_TYPE_MAX] = {
            "bformat",
            "bchunk",
            "efi_format",
            "efd_format",
            "iformat",
            "icore",
            "iext",
            "ibroot",
            "ilocal",
            "iattr_ext",
            "iattr_broot",
            "iattr_local",
            "qformat",
            "dquot",
            "quotaoff",
            "LR header",
            "unmount",
            "commit",
            "trans header"
        };
        static char *trans_type_str[XFS_TRANS_TYPE_MAX] = {
            "SETATTR_NOT_SIZE",
            "SETATTR_SIZE",
            "INACTIVE",
            "CREATE",
            "CREATE_TRUNC",
            "TRUNCATE_FILE",
            "REMOVE",
            "LINK",
            "RENAME",
            "MKDIR",
            "RMDIR",
            "SYMLINK",
            "SET_DMATTRS",
            "GROWFS",
            "STRAT_WRITE",
            "DIOSTRAT",
            "WRITE_SYNC",
            "WRITEID",
            "ADDAFORK",
            "ATTRINVAL",
            "ATRUNCATE",
            "ATTR_SET",
            "ATTR_RM",
            "ATTR_FLAG",
            "CLEAR_AGI_BUCKET",
            "QM_SBCHANGE",
            "DUMMY1",
            "DUMMY2",
            "QM_QUOTAOFF",
            "QM_DQALLOC",
            "QM_SETQLIM",
            "QM_DQCLUSTER",
            "QM_QINOCREATE",
            "QM_QUOTAOFF_END",
            "SB_UNIT",
            "FSYNC_TS",
            "GROWFSRT_ALLOC",
            "GROWFSRT_ZERO",
            "GROWFSRT_FREE",
            "SWAPEXT"
        };

        xfs_warn(mp,
                "xlog_write: reservation summary:\n"
                "  trans type  = %s (%u)\n"
                "  unit res    = %d bytes\n"
                "  current res = %d bytes\n"
                "  total reg   = %u bytes (o/flow = %u bytes)\n"
                "  ophdrs      = %u (ophdr space = %u bytes)\n"
                "  ophdr + reg = %u bytes\n"
                "  num regions = %u\n",
                ((ticket->t_trans_type <= 0 ||
                  ticket->t_trans_type > XFS_TRANS_TYPE_MAX) ?
                  "bad-trans-type" : trans_type_str[ticket->t_trans_type-1]),
                ticket->t_trans_type,
                ticket->t_unit_res,
                ticket->t_curr_res,
                ticket->t_res_arr_sum, ticket->t_res_o_flow,
                ticket->t_res_num_ophdrs, ophdr_spc,
                ticket->t_res_arr_sum +
                ticket->t_res_o_flow + ophdr_spc,
                ticket->t_res_num);

        for (i = 0; i < ticket->t_res_num; i++) {
                uint r_type = ticket->t_res_arr[i].r_type;
                xfs_warn(mp, "region[%u]: %s - %u bytes", i,
                            ((r_type <= 0 || r_type > XLOG_REG_TYPE_MAX) ?
                            "bad-rtype" : res_type_str[r_type-1]),
                            ticket->t_res_arr[i].r_len);
        }

        xfs_alert_tag(mp, XFS_PTAG_LOGRES,
                "xlog_write: reservation ran out. Need to up reservation");
        xfs_force_shutdown(mp, SHUTDOWN_LOG_IO_ERROR);
}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章