Thursday, June 1, 2017
Trying to address the issues I ran into yesterday.
Repairing Shuttle
After a lot of Googling, fiddling around and rebooting, I figured out that the correct way to restore the EFI entry for Shuttle was as follows:
efibootmgr -c -w -L Fedora -d /dev/sdb -p 1 -l '\EFI\fedora\shim.efi'
The problem that blocked me yesterday was stupid. The web sites show this kind of command line, but not correctly escaped. They show:
efibootmgr -c -w -L Fedora -d /dev/sdb -p 1 -l \EFI\fedora\shim.efi'
Which enters a boot string that won't work, so the machine ended up booting in Windows. Reported the problem as Red Hat Bugzilla 1457890. The irony of the story is that the first version of this page was correctly escaped for Blogmax, so what showed up on the web site was nonsensical
Analyzing Muse logs
Trying to understand the problems I ran into yesterday when I ran spicy.
I have two big batches corresponding to the two occasions I had to reboot Muse. The first one looks like this:
May 31 16:29:05 muse-dinechin-lan kernel: BUG: Bad page state in process src:src pfn:3f0309 May 31 16:29:05 muse-dinechin-lan kernel: page:ffffcbabcfc0c240 count:0 mapcount:0 mapping: (null) index:0x1 May 31 16:29:05 muse-dinechin-lan kernel: flags: 0x17ffffc0000014(referenced|dirty) May 31 16:29:05 muse-dinechin-lan kernel: raw: 0017ffffc0000014 0000000000000000 0000000000000001 00000000ffffffff May 31 16:29:05 muse-dinechin-lan kernel: raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000 May 31 16:29:05 muse-dinechin-lan kernel: page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set May 31 16:29:05 muse-dinechin-lan kernel: bad because of flags: 0x14(referenced|dirty)
The second one looks like this:
May 31 17:15:09 muse-dinechin-lan kernel: BUG: Bad page state in process Xorg pfn:137890 May 31 17:15:09 muse-dinechin-lan kernel: page:fffff9b9c4de2400 count:0 mapcount:0 mapping: (null) index:0x1 May 31 17:15:23 muse-dinechin-lan kernel: flags: 0x17ffffc0000004(referenced) May 31 17:15:23 muse-dinechin-lan kernel: raw: 0017ffffc0000004 0000000000000000 0000000000000001 00000000ffffffff May 31 17:15:23 muse-dinechin-lan kernel: raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000 May 31 17:15:24 muse-dinechin-lan kernel: page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set May 31 17:15:24 muse-dinechin-lan kernel: bad because of flags: 0x4(referenced)
Apparently, what triggered the first one is an out-of-memory event:
May 31 16:28:32 muse-dinechin-lan kernel: avahi-daemon invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, order=0, oom_score_adj=0
Probably too many VMs running. But that's strange: I have 12G of VM in a 16G machine with 8G of swap (ach no, actually 16G, I had not reconfigured it OK after cloning from Shuttle, but should still be OK). That should be OK. The OOM killer kills a VM, and things go south after that:
May 31 16:28:34 muse-dinechin-lan kernel: Out of memory: Kill process 4768 (qemu-system-x86) score 667 or sacrifice child
The second out-of-memory is invoked by no one else than good old spicy:
May 31 17:19:45 muse-dinechin-lan kernel: spicy invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, order=0, oom_score_adj=0
And indeed, spicy seems to be gobbling memory like crazy (I put a few other processes running on the machine for comparison):
[ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [ 2872] 1026 2872 6350689 3950284 11774 27 1940505 0 spicy [ 1645] 1026 1645 111580 585 134 4 4797 0 Xorg 2777] 1026 2777 181229 0 132 4 2017 0 gnome-terminal- [ 2783] 1026 2783 30918 1 14 3 476 0 bash [ 2810] 1026 2810 12844 0 27 3 145 0 ssh-agent [ 2860] 1026 2860 7919 48 20 3 78 0 xosview
By the way, inspecting the journal shows that the outside temperature is starting to tax my systems:
? May 31 10:30:08 muse-dinechin-lan kernel: CPU1: Core temperature above threshold, cpu clock throttled (total events = 3644) May 31 10:30:08 muse-dinechin-lan kernel: CPU5: Core temperature above threshold, cpu clock throttled (total events = 3644) May 31 10:30:08 muse-dinechin-lan kernel: CPU4: Package temperature above threshold, cpu clock throttled (total events = 3670) May 31 10:30:08 muse-dinechin-lan kernel: CPU0: Package temperature above threshold, cpu clock throttled (total events = 3670) May 31 10:30:08 muse-dinechin-lan kernel: CPU7: Package temperature above threshold, cpu clock throttled (total events = 3670) May 31 10:30:08 muse-dinechin-lan kernel: CPU2: Package temperature above threshold, cpu clock throttled (total events = 3674) May 31 10:30:08 muse-dinechin-lan kernel: CPU6: Package temperature above threshold, cpu clock throttled (total events = 3674) May 31 10:30:09 muse-dinechin-lan kernel: CPU3: Package temperature above threshold, cpu clock throttled (total events = 3670) May 31 10:30:09 muse-dinechin-lan kernel: CPU5: Package temperature above threshold, cpu clock throttled (total events = 3674)