Thursday, April 6, 2017
Today, I'm attempting to make a 3D-accelerated guest under qemu:///system.
Welcome message
This morning, my Mac greets me with nice welcome messages:
Am I the only one thinking that 30G of free space should be sufficient to collect some mails? It's not just the Finder goofing up either:
%iused Mounted on/dev/disk1 363Gi 330Gi 33Gi 91% 4977289 4289989990 0% /
Interesting, if I click on "Manage", what I get is frightening:
Look at the "System" section. What are these 139GB of disk used by the "System"? Seriously?
What is really weird too is that the Finder tells me my home directory is 266.88GB. I don't think so, no. I think it's time to run fsck -l /dev/disk1 to check what's going on here.
root@ptitpuce emsdk> fsck_hfs -l /dev/disk1** /dev/rdisk1 (NO WRITE) ** Root file system Executing fsck_hfs (version hfs-366.30.3). ** Performing live verification. ** Checking Journaled HFS Plus volume. The volume name is PtitPuce ** Checking extents overflow file. ** Checking catalog file. ** Checking multi-linked files. ** Checking catalog hierarchy. ** Checking extended attributes file. ** Checking volume bitmap. Volume bitmap needs minor repair for orphaned blocks ** Checking volume information. Invalid volume free block count (It should be 11144689 instead of 10865794) ** The volume PtitPuce was found corrupt and needs to be repaired.
OK. That explains a lot. Inferior filesystems (I blogged elsewhere that I was utterly not convinced by APFS, but HFS is not perfect either).
My Mail monitoring script was running, and it ran into trouble around that time:
07:32:11: 24207:32:42: 242 07:33:14: 242 lsof: status error on 29859: No such file or directory lsof 4.89 latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/ latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man usage: [-?abhlnNoOPRtUvV] [+|-c c] [+|-d s] [+D D] [+|-f[cgG]] [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+|-M] [-o [o]] [-p s] [+|-r [t]] [-s [p:s]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [--] [names] Use the ``-h'' option to get more help information. 07:33:46: 0 07:34:18: 242 07:34:50: 242 07:35:21: 242 07:35:54: 242 07:36:26: 242 07:36:59: 242 lsof: status error on 29995: No such file or directory lsof 4.89 latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/ latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man usage: [-?abhlnNoOPRtUvV] [+|-c c] [+|-d s] [+D D] [+|-f[cgG]] [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+|-M] [-o [o]] [-p s] [+|-r [t]] [-s [p:s]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [--] [names] Use the ``-h'' option to get more help information. 07:37:31: 0 lsof: status error on 29995: No such file or directory lsof 4.89 latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/ latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man usage: [-?abhlnNoOPRtUvV] [+|-c c] [+|-d s] [+D D] [+|-f[cgG]] [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+|-M] [-o [o]] [-p s] [+|-r [t]] [-s [p:s]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [--] [names] Use the ``-h'' option to get more help information. 07:38:08: 0 lsof: status error on 29995: No such file or directory lsof 4.89 latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/ latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man usage: [-?abhlnNoOPRtUvV] [+|-c c] [+|-d s] [+D D] [+|-f[cgG]] [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+|-M] [-o [o]] [-p s] [+|-r [t]] [-s [p:s]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [--] [names] Use the ``-h'' option to get more help information. 07:38:39: 0 lsof: status error on 29995: No such file or directory lsof 4.89 latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/ latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man usage: [-?abhlnNoOPRtUvV] [+|-c c] [+|-d s] [+D D] [+|-f[cgG]] [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+|-M] [-o [o]] [-p s] [+|-r [t]] [-s [p:s]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [--] [names] Use the ``-h'' option to get more help information. 07:39:11: 0 07:39:42: 242 07:40:14: 242 07:41:31: 242 07:42:02: 242
242 open files is not that much. It's a lot more than when I had just opened the Mail application (about 140), so there's a leak. But that was probably not enough to impair the system's behavior.
Time to reboot the machine again, I guess.
Rebooted, ran the recovery image, fsck, repaired.
Enabling 3D in QEMU guest
Recap of yesterday. To activate 3D, I followed the instructions on the Spice page:
- You need to add a virtio-gpu video device to your virtual machine
instead of QXL.
<video> <model type='virtio' heads='1'> <acceleration accel3d='yes'/> </model>
</video>
- Then you need to enable OpenGL on your SPICE graphics node:
<graphics type='spice' autoport='no'> <gl enable='yes'/> </graphics>
But for a qemu:///system device, that's not sufficient either. You have to make sure that user qemu can access your /dev/dri rendering interface. That's owned by root:video, so adding user qemu to the video group should be good enough.
usermod -a -G video qemu
It is not sufficient. I still have an issue opening the EGL render.
Error starting domain: internal error: process exited while connecting to monitor: 2017-04-06T09:19:07.265013Z qemu-system-x86_64: egl: no drm render node available2017-04-06T09:19:07.265115Z qemu-system-x86_64: Failed to initialize EGL render node for SPICE GL
The original problem is described in Red Hat Bugzilla 1337290. Reading information from Red Hat Bugzilla 1337333, which I reached from Red Hat Bugzilla 1364075, referred in Red Hat Bugzilla 1337290. The message above is after a host reboot.
Trying to follow the suggestion of this comment and adding the following to /etc/libvirt/qemu.conf:
cgroup_device_acl = [ "/dev/null", "/dev/full", "/dev/zero", "/dev/random", "/dev/urandom", "/dev/ptmx", "/dev/kvm", "/dev/kqemu", "/dev/rtc","/dev/hpet", "/dev/vfio/vfio", "/dev/dri/renderD128" ]
Now guest is booting. So it looks like the current state of Fedora25 is not good enough yet, you have to do the manual adjustments above.
The guest now sees the accelerated 3D interface:
[ddd@f25-shuttle ~]$ dmesg | grep drm [ 22.531176] [drm] Initialized [ 22.938684] [drm] pci: virtio-vga detected at 0000:00:02.0 [ 22.939198] [drm] virgl 3d acceleration enabled [ 22.939972] [drm] virtio vbuffers: 288 bufs, 192B each, 54kB total. [ 22.940246] [drm] number of scanouts: 1 [ 22.940255] [drm] number of cap sets: 1 [ 22.947790] [drm] cap set 0: id 1, max-version 1, max-size 308 [ 22.969754] virtio_gpu virtio0: fb0: virtiodrmfb frame buffer device [ 22.975394] [drm] Initialized virtio_gpu 0.0.1 0 on minor 0
But when I run glxgears -info, I get a black window and the following messages:
libGL error: MESA-LOADER: failed to retrieve device information MESA-LOADER: failed to retrieve device information MESA-LOADER: failed to retrieve device information Running synchronized to the vertical refresh. The framerate should be approximately the same as the monitor refresh rate. GL_RENDERER = Gallium 0.4 on virgl GL_VERSION = 3.0 Mesa 13.0.4 GL_VENDOR = Red Hat VisualID 440, 0x1b8
If instead of Wayland, I use GNOME on Xorg, I get a Terminal window that is not even legible:
I guess this is Red Hat Bugzilla 1426549. Will build a new Mesa.
Being able to ssh directly into my guest is so cool. I love my new DNS setup.
Rebuilding Mesa
Configuration for Mesa from my tips:
./configure --build=i686-redhat-linux-gnu --host=i686-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --disable-asm --enable-selinux --enable-osmesa --with-dri-driverdir=/usr/lib/dri --enable-egl --disable-gles1 --enable-gles2 --disable-xvmc --with-egl-platforms=x11,drm --enable-shared-glapi --enable-gbm --disable-opencl --enable-glx-tls --enable-texture-float=yes --enable-gallium-llvm --with-llvm-shared-libs --enable-dri --enable-xa --with-gallium-drivers=svga,radeonsi,swrast,r600,r300,nouveau,virgl --with-dri-drivers=nouveau,radeon,r200,i915,i965
The option --with-llvm-shared-libs seems to be gone now.
CXX common/common_libamd_common_la-ac_llvm_helper.log++: error: /usr/lib/rpm/redhat/redhat-hardened-cc1: No such file or directory Makefile:1058: recipe for target 'common/common_libamd_common_la-ac_llvm_helper.lo' failed
This specific error is fixed by installing redhat-rpm-config:
dnf install redhat-rpm-config
Hmmm, that did not work. I still have the issue after installing the new Mesa. I apparently installed it incorrectly:
GL_RENDERER = Gallium 0.4 on virgl GL_VERSION = 3.0 Mesa 13.0.4 GL_VENDOR = Red Hat
Version should be something like 17 now. Looking at what I did over one month ago (already). It looks like the configuration line is different:
./configure --prefix=/usr --enable-libglvnd --enable-selinux --enable-gallium-osmesa --with-dri-driverdir=/usr/lib64/dri --enable-gl --disable-gles1 --enable-gles2 --disable-xvmc --with-egl-platforms=drm,x11,surfaceless,wayland --enable-shared-glapi --enable-gbm --enable-glx-tls --enable-texture-float=yes --enable-gallium-llvm --enable-llvm-shared-libs --enable-dri --with-gallium-drivers=i915,nouveau,r300,svga,swrast,virgl --with-dri-drivers=swrast,nouveau
One important difference is where the libraries are installed. Updating my tips again. But installing Mesa with this command line results in a black screen. I can still access the console with Control-Alt F3 (I think this is F3).
Trying to login to enable ssh access even when I'm not logged in under Gnome. Ooops, Shuttle hung. Darn.
After rebooting the host, when I boot the guest, I now have some graphics during boot. So there is something dark and nasty lurking in the shadows, ready to pounce on the unsuspecting VM user. But then, the last entry I see is Started GNOME Display Manager. and nothing else.
Activating ssh daemon on the Shuttle guest.
Returning to the version that was active when I had this working, c0e9e61c9a1. But remembering that it was on Big, which has a different GPU (Nvidia GeForce GTX-760).
Not better. Gnome crashes:
[ 4.439155] gnome-session-f[1156]: segfault at 0 ip 00007f17b5ced4e9 sp 00007ffe543caf10 error 4 in libgtk-3.so.0.2200.10[7f17b5a0e000+6f9000]
And of course, I had forgotten to snapshot the VM before doing that. Attempting to fix with
dnf reinstall dnf reinstall $(dnf search xorg | awk '{ print $1; }' | grep xorg)
That worked. Cool !
However, as soon as I reestablish acceleration in the guest, I get something rather spectacular in terms of output:
Trying again with
dnf reinstall $(dnf search mesa | awk '{ print $1; }' | grep mesa)
Still some garbage on the output. A different kind of garbage, rather less pleasing to the eye.
Trying the same setup on Muse, where it's a GTX-580, older Nividia card. Here, we have a different kind of art:
It looks like the problem there is somethine else, like a wrong stride when copying the buffer out. I can log in by "guessing" what each part is based on things like mouse movements, etc.
Uh oh, now Muse locked up. So maybe the last lock up on Shuttle was a software issue and not hardware, which in a sense is rather reassuring, given how much time I spent trying to stabilize that system.
Giving up for tonight.