Wednesday, April 5, 2017

Infrastructure: looking up

My infrastructure is finally beginning to seem a bit usable for automated builds. I could do several udpates to my personal projects yesterday evening and see the builds trigger both on Git Lab and on the local Jenkins instance.

There are still a few failures here and there, but overall, I'm starting to feel more confident that I should be able to migrate more complex projects like Tao3D or Spice to it.

Setup of VMs on Shuttle

The VMs images were copied on Shuttle, now time to import them. For now, I am manually setting the VM UUID and the MAC address. It is also necessary to do some host device adjustments manually, e.g. network device name. I think that the next step would be to clone using virt-manager and see how well that works. Since I plan to move the "fedora3D" VM to qemu:///system (see below), that might be a good test.

Gerd suggested that it is possible to run a qemu:///system VM with 3D acceleration enabled. This is a bit tricky, as it implies to give special permissions to the qemu user and put selinux in permissive mode.

VMs seem to have transferred successfully to Shuttle. Now is the real test: will Shuttle stay up even with VMs running and active? Also, Windows 10 seems very slow to boot. I remember seeing Windows 10 guests boot ultra-rapidly on laptops, I wonder why it's taking ages here. Maybe some Windows update going on behind the scene?

Hmm, Shuttle is now spending more than half its time in SYS according to xosview. Not dead, but not very responsive either. The existing processes over X11 are responsive, e.g. xosview or my Emacs session. But running ps in a shell window hangs. Something fishy... Similarly, dmesg gets stuck too. I can't even start a new shell within my existing Emacs session.

Some lock in the kernel being held or something like that? The mouse on the local screen does not respond either, but if it was captured by a guest and I can't get it released, it is not too surprising. There is a significant amount of disk I/O going on (about 9M/s). Response to ping appears normal.

Looks like a dead system to me. Will give it a while to finish whatever it's doing, but if it's not done by 11:15, I'll kill it and restart VMs one by one to see which one is causing trouble (I suspect it's the macOS VM, it shows a black screen)

By contrast, Muse keeps happily chugging along Jenkins build jobs. The CPU goes up and down normally.

Rebooted Shuttle. One of the possible factors for the somewhat increased stability was that I had upgraded the BIOS. But a result is that it boots by default in Windows. I had booted into Fedora using the so-called BBS Popup (hitting F7 when booting). Problem is that the new BIOS has somewhat changed configuration parameters, and no longer seems to be able to recognize my disk array as anything but a floppy disk by default. So it won't boot over UEFI. With the BBS Popup, it works, though. The problem is that I can't seem to find a configuration that would correctly boot my external disk as UEFI by default. I tried several things. In all cases, I either end up booting on Windows or on the external disk but not UEFI, so it tells me I need to insert a valid boot disk in the floppy!!! Aaaargh.

Being tired of that crap, I rebooted in Fedora using the BBS menu. But then when I launched my Win10 VM, the qemu:///system session disconnected in virt-manager, the libvirtd started hugging 100% CPU, and I had some nasty stack traces in dmesg. I should have captured them. Trying again.

The second run was much better. I adjusted the VM images to indicate the new host name. My new DNS resolution works wonders in that case, updating DNS entries. Unfortunately, the TTL plays a bad role. For instance, I moved macOS-muse to macOS-shuttle, but it gave the macOS-muse away at first boot. Any host that had looked up the name before I updated it now has it in cache, and gets confused. Also, it evacuated the old macOS-muse, so I have to reboot that other guest for the name to come back

Jenkins and MINGW

Making my various projects build correctly on MINGW is a real pain. For example, the file windows.h defines a number of types with common names in upper case, e.g. CONTEXT or ERROR. If your project also uses this kind of names, bad luck.

I have no real issue deoptimizing the Windows builds by making inline functions out of line if that makes them pass. I won't spend hours on this kind of fix. How the Windows platform could have any traction with developers is a true mystery to me.

The latest I discovered is that environment variables are necessarily upper-case under Windows. So my Jenkins build which were using environment variable names such as $target fail on MinGW where the actual variable is transformed into $TARGET. Well, if setting it forces it into uppercase, why doesn't reading it force it into uppercase as well? Stupid stupid build environment.

Next error is that install does not exist, so the install targets in the makefiles fail.

iTunes gets worse with every release

It's slower and slower, with less and less useful features, and more and more "grab the money and run" features. It's sad, really. Right now, I've been waiting for over a minute for it to go out of the spinning cursor. I just want to delete 5 files, zut et flûte à la fin!

Fedora on Muse

I tried moving the Fedora 25 image on Muse to /var/lib/libvirt/images in order to boot it as qemu:///system. And as usual, I had disk I/O errors.

mv: error reading 'ubuntu14.04.qcow2': Input/output error

mv: error reading 'ubuntu16.04-32.qcow2': Input/output error

This is on the new 3TB disk on Muse. I'm starting to worry about that disk.

In dmesg, I see messages such as:

[Apr 5 12:48] BTRFS warning (device sda4): csum failed ino 2234 off 21172404224 csum 3678210820 expected csum 2566472073
[  +0.115344] BTRFS warning (device sda4): csum failed ino 2234 off 21172404224 csum 3678210820 expected csum 2566472073
[  +0.034275] BTRFS warning (device sda4): csum failed ino 2234 off 21172404224 csum 3678210820 expected csum 2566472073

Hmmm. Running an extended SMART self-test. The disk is new. If it's bad, I'll get it replaced.

Windows 10 setup

Installing MSYS2 in my Windows 10 guest on Shuttle and Muse. Uh oh, Shuttle hung again? Hmmm.

MSYS2 uses pacman as their package manager. git is not available by default, so now I have to learn yet another package manager. Tried pacman install git. How naive I was! It's pacman -S git, for "sync". But I'm sure this is cool.

This stuff just does not work too well. The first run of anything like make is a real poem:

I had just done a pacman -Sy, which updated the system. Rebooting seems a bit extreme, but restarting MSYS might help. I was just trying to have an easy way to get git. But maybe I was not optimistic enough, maybe Visual Studio has it?

Need to rename the PC. I like the "Find anything" search bar in Windows 10. Here, I ask "PC name" and it finds something relevant. #ThisIsNotMicrosoftBob. I don't like that renaming the PC requires a reboot. In 2017. And then the reboot takes forever and a half. "Getting Windows ready" (for what? a new name????). Then it helpfully reboots. And then, once again, "Getting ready".

Installing new Fedora25 for 3D

Since I had so much trouble copying my Fedora 25 3D image, I thought I'd try something else, namely installing a new VM from scratch. I have done that before, this should work.

Unable to complete install: 'internal error: process exited while connecting to monitor: sterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/fedora25-shuttle.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -drive file=/data/systems/ISO/Fedora-Workstation-Live-x86_64-25/Fedora-Workstation-Live-x86_64-25-1.3.iso,format=raw,if=none,id=drive-ide0-0-0,readonly=on -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:8f:26:36,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-1-fedora25-shuttle/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=ch'

The output of dmesg is ever helpful:

[Apr 5 16:07] radeon 0000:01:00.0: evergreen_surface_check_2d:282 texture pitch 1920 invalid must be aligned with 512

[ +0.000004] radeon 0000:01:00.0: evergreen_cs_track_validate_texture:831 texture invalid 0x1dfc3bc1 0x40000437 0x060a0000 0x00000000 0x80000000 0x800304da [ +0.000033] [drm:radeon_cs_ioctl [radeon]] *ERROR* Invalid command stream ! [ +1.387372] fuse init (API version 7.26) [Apr 5 16:08] nf_conntrack: default automatic helper assignment has been turned off for security reasons and CT-based firewall rule not found. Use the iptables CT target to attach helpers instea

That does not seem directly related to my problem. Trying again.

Now I have my old foe:

Unable to complete install: 'internal error: qemu unexpectedly closed the monitor: 2017-04-05T14:15:02.475566Z qemu-system-x86_64: -drive file=/data/systems/ISO/Fedora-Workstation-Live-x86_64-25/Fedora-Workstation-Live-x86_64-25-1.3.iso,format=raw,if=none,id=drive-ide0-0-0,readonly=on: Could not open '/data/systems/ISO/Fedora-Workstation-Live-x86_64-25/Fedora-Workstation-Live-x86_64-25-1.3.iso': Permission denied'

Applying the usual fix:

setsebool -P virt_use_nfs 1

Now looking like an install might actually start. The VM boots, setup in progress. No idea if I'll go anywhere with an AMD card (that's what is now in Shuttle).

Domain dinechin.org

Transfer still "in progress" on the Gandi site.

Converting Fedora25 guest to 3D

To activate 3D, I followed the instructions on the Spice page:

  • You need to add a virtio-gpu video device to your virtual machine instead of QXL.
    <video>
      <model type='virtio' heads='1'>
        <acceleration accel3d='yes'/>
      </model>
    

    </video>

  • Then you need to enable OpenGL on your SPICE graphics node:
    <graphics type='spice' autoport='no'>
      <gl enable='yes'/>
    </graphics>
You don’t need any port/address as they won’t be usable with GL. Notice that this means you have to make sure that the <graphics> section does no longer <listen> to non-local address, and to change autoport='yes' to autoport='no'.

But for a qemu:///system device, that's not sufficient either. You have to make sure that user qemu can access your /dev/dri rendering interface. That's owned by root:video, so adding user qemu to the video group should be good enough.

usermod -a -G video qemu

Still no cigar:

Error starting domain: internal error: qemu unexpectedly closed the monitor: ev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=0,disable-ticketing,gl=on,seamless-migration=on -device virtio-vga,id=video0,virgl=on,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x9 -msg timestamp=on
2017-04-05T17:02:43.613482Z qemu-system-x86_64: egl: no drm render node available

Have to love the conciseness of Linux error messages. It's a bit at the opposite of the Windows ("an illegal operation was performed") or macOS ("Unknown error 27") schools of error reporting.