Debugging BTRFS progs

Tuesday, April 25, 2017

Having tried all the official BTRFS tools and gone as far as I could with a boot from a Live Fedora25 USB stick, I went ahead and reinstalled Fedora 25 on the old Shuttle boot disk. Why I had to reinstall, I don't know, but apparently the bootlader that was installed on that disk is not "universal", and Muse won't boot from it.

After reinstallation, rebuilt btrfs-progs-4.10.1-1.fc25.x86_64.rpm from RPM sources. The first run gets me:

enabling repair mode
Checking filesystem on /dev/sda4
UUID: 26a0c84c-d2ac-4da8-b880-684f2ea48a22
checking extents
ERROR: add_tree_backref failed: File exists
extent_io.c:143: insert_state: BUG_ON `end < start` triggered, value 1
btrfs check(+0x305a6)[0x556c478505a6]
btrfs check(+0x307fd)[0x556c478507fd]
btrfs check(set_extent_bits+0x256)[0x556c47850cc6]
btrfs check(+0x6329b)[0x556c4788329b]
btrfs check(cmd_check+0xb43)[0x556c47885143]
btrfs check(main+0x80)[0x556c47837c80]
/lib64/libc.so.6(__libc_start_main+0xf1)[0x7f4ed220a401]
btrfs check(_start+0x2a)[0x556c47837d8a]
Aborted (core dumped)

Same as before. Running under gdb without the --repair option gives:

root 258 inode 285735 errors 2001, no inode item, link count wrong
	unresolved ref dir 362 index 44 namelen 11 name fedora.json filetype 1 errors 4, no inode ref
root 258 inode 285736 errors 2001, no inode item, link count wrong
	unresolved ref dir 472 index 42 namelen 10 name gnome.json filetype 1 errors 4, no inode ref
root 258 inode 285737 errors 2001, no inode item, link count wrong
	unresolved ref dir 481 index 41 namelen 18 name extensions-web.xml filetype 1 errors 4, no inode ref
root 258 inode 285738 errors 2001, no inode item, link count wrong
	unresolved ref dir 389 index 44 namelen 9 name odrs.json filetype 1 errors 4, no inode ref
root 258 inode 285759 errors 2001, no inode item, link count wrong
	unresolved ref dir 281 index 648 namelen 13 name notifications filetype 1 errors 4, no inode ref
root 258 inode 285768 errors 2001, no inode item, link count wrong
	unresolved ref dir 257 index 366 namelen 11 name .Xauthority filetype 1 errors 4, no inode ref
root 258 inode 287091 errors 2001, no inode item, link count wrong
	unresolved ref dir 521 index 38 namelen 30 name .saves-7375-muse.dinechin.org~ filetype 1 errors 4, no inode ref
root 258 inode 287119 errors 2001, no inode item, link count wrong
	unresolved ref dir 257 index 368 namelen 8 name .lesshst filetype 1 errors 4, no inode ref
root 258 inode 289021 errors 2001, no inode item, link count wrong
	unresolved ref dir 266 index 76 namelen 13 name gstreamer-1.0 filetype 2 errors 4, no inode ref
root 258 inode 289997 errors 2001, no inode item, link count wrong
	unresolved ref dir 3058 index 16 namelen 10 name cscope.out filetype 1 errors 4, no inode ref
ERROR: errors found in fs roots
found 1124075982849 bytes used, error(s) found
total csum bytes: 1094304808
total tree bytes: 2285043712
total fs tree bytes: 797573120
total extent tree bytes: 258555904
btree space waste bytes: 294399420
file data blocks allocated: 72350215241728
 referenced 1037659140096

With the --repair option, I get the following stack trace information:

gdb) r --repair /dev/sda4
Starting program: /usr/sbin/btrfsck --repair /dev/sda4
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
enabling repair mode
Checking filesystem on /dev/sda4
UUID: 26a0c84c-d2ac-4da8-b880-684f2ea48a22
checking extents
ERROR: add_tree_backref failed: File exists
extent_io.c:143: insert_state: BUG_ON `end < start` triggered, value 1
btrfs check(+0x30596)[0x555555584596]
btrfs check(+0x307ed)[0x5555555847ed]
btrfs check(set_extent_bits+0x256)[0x555555584cb6]
btrfs check(+0x6328b)[0x5555555b728b]
btrfs check(cmd_check+0xb43)[0x5555555b9133]
btrfs check(main+0x80)[0x55555556bc70]
/lib64/libc.so.6(__libc_start_main+0xf1)[0x7ffff6f92401]
btrfs check(_start+0x2a)[0x55555556bd7a]

Program received signal SIGABRT, Aborted. __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58 58 } Missing separate debuginfos, use: dnf debuginfo-install libgcc-6.3.1-1.fc25.x86_64 (gdb) up #1 0x00007ffff6fa952a in __GI_abort () at abort.c:89 89 raise (SIGABRT); (gdb) #2 0x00005555555845aa in bugon_trace ( assertion=assertion@entry=0x5555555d5593 "end < start", func=func@entry=0x5555555d56f8 <__func__.9791> "insert_state", line=line@entry=143, val=val@entry=1, filename=0x5555555d5577 "extent_io.c") at kerncompat.h:115 115 abort(); (gdb) #3 0x00005555555847ed in bugon_trace (val=1, line=143, func=, filename=0x5555555d5577 "extent_io.c", assertion=0x5555555d5593 "end < start") at extent_io.c:152 152 } (gdb) #4 insert_state (tree=tree@entry=0x7fffffffdba0, state=state@entry=0x55557eb12010, start=start@entry=52428800, end=end@entry=52428799, bits=bits@entry=1) at extent_io.c:143 143 BUG_ON(end < start); (gdb) #5 0x0000555555584cb6 in set_extent_bits (tree=0x7fffffffdba0, start=52428800, end=52428799, bits=bits@entry=1) at extent_io.c:310 310 err = insert_state(tree, prealloc, start, end, bits); (gdb) #6 0x0000555555584d9a in set_extent_dirty (tree=, start=, end=) at extent_io.c:414 414 return set_extent_bits(tree, start, end, EXTENT_DIRTY); (gdb) #7 0x00005555555b728b in check_extent_refs (extent_cache=0x7fffffffda60, root=0x55555587d570) at cmds-check.c:9255 9255 set_extent_dirty(root->fs_info->excluded_extents, (gdb) #8 check_chunks_and_extents (root=root@entry=0x55555587d570) at cmds-check.c:9945 9945 ret = check_extent_refs(root, &extent_cache); (gdb) #9 0x00005555555b9133 in cmd_check (argc=, argv=) at cmds-check.c:12961 12961 ret = check_chunks_and_extents(root); (gdb) #10 0x000055555556bc70 in main (argc=3, argv=0x7fffffffe328) at btrfs.c:246 246 ret = cmd->fn(argc, argv); (gdb)

Second run under debugger gives me a slightly different result:

Program received signal SIGSEGV, Segmentation fault.
list_del (entry=0x555555db0420) at /usr/src/debug/btrfs-progs-v4.10.1/kernel-lib/list.h:125
125		__list_del(entry->prev, entry->next);
(gdb) bt
#0  list_del (entry=0x555555db0420) at /usr/src/debug/btrfs-progs-v4.10.1/kernel-lib/list.h:125
#1  free_all_extent_backrefs (rec=0x555555db0350) at cmds-check.c:5386
#2  maybe_free_extent_rec (extent_cache=0x7fffffffd990, rec=0x555555db0350) at cmds-check.c:5417
#3  0x00005555555b308f in check_block (flags=, buf=0x55557b87cdf0, extent_cache=0x7fffffffd990, root=0x55555587d570) at cmds-check.c:5851
#4  run_next_block (root=root@entry=0x55555587d570, bits=bits@entry=0x555555884170, last=last@entry=0x7fffffffd738, pending=pending@entry=0x7fffffffd9b0, seen=seen@entry=0x7fffffffd9a0, reada=reada@entry=0x7fffffffd9c0, nodes=0x7fffffffd9d0, extent_cache=0x7fffffffd990, chunk_cache=0x7fffffffd980, dev_cache=0x7fffffffd970, block_group_cache=0x7fffffffda80, dev_extent_cache=0x7fffffffdaa0, ri=0x55555587d820, bits_nr=1024) at cmds-check.c:7690
#5  0x00005555555b4b6d in deal_root_from_list (list=list@entry=0x7fffffffda00, root=root@entry=0x55555587d570, bits=bits@entry=0x555555884170, pending=pending@entry=0x7fffffffd9b0, seen=seen@entry=0x7fffffffd9a0, reada=reada@entry=0x7fffffffd9c0, nodes=0x7fffffffd9d0, extent_cache=0x7fffffffd990, chunk_cache=0x7fffffffd980, dev_cache=0x7fffffffd970, block_group_cache=0x7fffffffda80, dev_extent_cache=0x7fffffffdaa0, bits_nr=1024) at cmds-check.c:9751
#6  0x00005555555b5805 in check_chunks_and_extents (root=root@entry=0x55555587d570) at cmds-check.c:9918
#7  0x00005555555b9133 in cmd_check (argc=, argv=) at cmds-check.c:12961
#8  0x000055555556bc70 in main (argc=3, argv=0x7fffffffe258) at btrfs.c:246

Not sure why this run is different. The etnry look readable:

$2 = {next = 0x0, prev = 0x555555fb0350}

I don't think it would fail just because next is null. Well, actually, according to the source code it does:

static inline void __list_del(struct list_head * prev, struct list_head * next)
{
	next->prev = prev;
	prev->next = next;
}

So we should never have null as next. Maybe time to build with CONFIG_DEBUG_LIST???

The problem comes from this code in cmds-check.c in function check_extent_refs around L9260:

		while(cache) {
			rec = container_of(cache, struct extent_record, cache);
			set_extent_dirty(root->fs_info->excluded_extents,
					 rec->start,
					 rec->start + rec->max_size - 1);
			cache = next_cache_extent(cache);
		}

In my case, rec->max_size is 0. The content of the corresponding record shows an inconsistency between size and max_size:

(gdb) p *rec
$10 = {
  backrefs = {
    next = 0x555556490100, 
    prev = 0x555556490100
  }, 
  dups = {
    next = 0x555556490040, 
    prev = 0x555556490040
  }, 
  list = {
    next = 0x555556490050, 
    prev = 0x555556490050
  }, 
  cache = {
    rb_node = {
      __rb_parent_color = 93824996670721, 
      rb_right = 0x0, 
      rb_left = 0x0
    }, 
    objectid = 0, 
    start = 52428800, 
    size = 1
  }, 
  parent_key = {
    objectid = 0, 
    type = 0 '000', 
    offset = 0
  }, 
  start = 52428800, 
  max_size = 0, 
  nr = 1, 
  refs = 0, 
  extent_item_refs = 0, 
  generation = 0, 
  parent_generation = 0, 
  info_objectid = 0, 
  num_duplicates = 0, 
  info_level = 0 '000', 
  flag_block_full_backref = 2, 
  found_rec = 0, 
  content_checked = 0, 
  owner_ref_checked = 0, 
  is_root = 0, 
  metadata = 1, 
  bad_full_backref = 0, 
  crossing_stripes = 0, 
  wrong_chunk_type = 0
}

The problem also occurs on current master branch.

Added the Red Hat Bugzilla 1435567"> the recorder to btrfs tools.

Recorder entries before the crash look like this:

5542557 [73.057880:0x45db8b] MAX_SIZE: Add 229/231 max_size 16384 for extent cache 0x7fffffffdb30
5542558 [73.057882:0x45db8b] MAX_SIZE: Add 230/231 max_size 16384 for extent cache 0x7fffffffdb30
5542559 [82.656875:0x45de02] MAX_SIZE: Add root to pending max_size 16384 for extent cache 0x7fffffffdb30
5542560 [82.660978:0x45de02] MAX_SIZE: Add root to pending max_size 16384 for extent cache 0x7fffffffdb30
5542561 [82.661416:0x45ead4] CHECK: Check extent root 0x77c180 cache 0x7fffffffdb30
5542562 [82.661417:0x45ec30] REPAIR: Search cache 0x138ec80 start 52428800 size 1
5542563 [82.661417:0x45ec58] EXTENTS: Container extent record 0x138ec50 for cache 0x138ec80 start 52428800 max_size 0

Added instrumentation to specifically record if max_size is ever set to 0. It trips, but this time I have the list SIGSEGV again.

78655 [1.792241:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139eb80 max_size 16384 tmpl 0x7fffffffd120

78657 [1.792242:0x451cb8] MAX_SIZE_ZERO: Set max size 0 for rec 0x139ec50 from tmpl 0x7fffffffcf80 78660 [1.792244:0x451fe0] MAX_SIZE_ZERO: Add extent rec 0x139ed50 max_size 16384 tmpl 0x7fffffffd120