Enabling 3D acceleration with in QEMU system session

Thursday, April 6, 2017

Today, I'm attempting to make a 3D-accelerated guest under qemu:///system.

Welcome message

This morning, my Mac greets me with nice welcome messages:

Am I the only one thinking that 30G of free space should be sufficient to collect some mails? It's not just the Finder goofing up either:

 %iused  Mounted on

/dev/disk1 363Gi 330Gi 33Gi 91% 4977289 4289989990 0% /

Interesting, if I click on "Manage", what I get is frightening:

Look at the "System" section. What are these 139GB of disk used by the "System"? Seriously?

What is really weird too is that the Finder tells me my home directory is 266.88GB. I don't think so, no. I think it's time to run fsck -l /dev/disk1 to check what's going on here.

root@ptitpuce emsdk> fsck_hfs -l /dev/disk1 

** /dev/rdisk1 (NO WRITE) ** Root file system Executing fsck_hfs (version hfs-366.30.3). ** Performing live verification. ** Checking Journaled HFS Plus volume. The volume name is PtitPuce ** Checking extents overflow file. ** Checking catalog file. ** Checking multi-linked files. ** Checking catalog hierarchy. ** Checking extended attributes file. ** Checking volume bitmap. Volume bitmap needs minor repair for orphaned blocks ** Checking volume information. Invalid volume free block count (It should be 11144689 instead of 10865794) ** The volume PtitPuce was found corrupt and needs to be repaired.

OK. That explains a lot. Inferior filesystems (I blogged elsewhere that I was utterly not convinced by APFS, but HFS is not perfect either).

My Mail monitoring script was running, and it ran into trouble around that time:

07:32:11: 242

07:32:42: 242 07:33:14: 242 lsof: status error on 29859: No such file or directory lsof 4.89 latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/ latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man usage: [-?abhlnNoOPRtUvV] [+|-c c] [+|-d s] [+D D] [+|-f[cgG]] [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+|-M] [-o [o]] [-p s] [+|-r [t]] [-s [p:s]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [--] [names] Use the ``-h'' option to get more help information. 07:33:46: 0 07:34:18: 242 07:34:50: 242 07:35:21: 242 07:35:54: 242 07:36:26: 242 07:36:59: 242 lsof: status error on 29995: No such file or directory lsof 4.89 latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/ latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man usage: [-?abhlnNoOPRtUvV] [+|-c c] [+|-d s] [+D D] [+|-f[cgG]] [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+|-M] [-o [o]] [-p s] [+|-r [t]] [-s [p:s]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [--] [names] Use the ``-h'' option to get more help information. 07:37:31: 0 lsof: status error on 29995: No such file or directory lsof 4.89 latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/ latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man usage: [-?abhlnNoOPRtUvV] [+|-c c] [+|-d s] [+D D] [+|-f[cgG]] [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+|-M] [-o [o]] [-p s] [+|-r [t]] [-s [p:s]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [--] [names] Use the ``-h'' option to get more help information. 07:38:08: 0 lsof: status error on 29995: No such file or directory lsof 4.89 latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/ latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man usage: [-?abhlnNoOPRtUvV] [+|-c c] [+|-d s] [+D D] [+|-f[cgG]] [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+|-M] [-o [o]] [-p s] [+|-r [t]] [-s [p:s]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [--] [names] Use the ``-h'' option to get more help information. 07:38:39: 0 lsof: status error on 29995: No such file or directory lsof 4.89 latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/ latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man usage: [-?abhlnNoOPRtUvV] [+|-c c] [+|-d s] [+D D] [+|-f[cgG]] [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+|-M] [-o [o]] [-p s] [+|-r [t]] [-s [p:s]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [--] [names] Use the ``-h'' option to get more help information. 07:39:11: 0 07:39:42: 242 07:40:14: 242 07:41:31: 242 07:42:02: 242

242 open files is not that much. It's a lot more than when I had just opened the Mail application (about 140), so there's a leak. But that was probably not enough to impair the system's behavior.

Time to reboot the machine again, I guess.

Rebooted, ran the recovery image, fsck, repaired.

Enabling 3D in QEMU guest

Recap of yesterday. To activate 3D, I followed the instructions on the Spice page:

  • You need to add a virtio-gpu video device to your virtual machine instead of QXL.
    <video>
      <model type='virtio' heads='1'>
        <acceleration accel3d='yes'/>
      </model>
    

    </video>

  • Then you need to enable OpenGL on your SPICE graphics node:
    <graphics type='spice' autoport='no'>
      <gl enable='yes'/>
    </graphics>
You don’t need any port/address as they won’t be usable with GL. Notice that this means you have to make sure that the <graphics> section does no longer <listen> to non-local address, and to change autoport='yes' to autoport='no'.

But for a qemu:///system device, that's not sufficient either. You have to make sure that user qemu can access your /dev/dri rendering interface. That's owned by root:video, so adding user qemu to the video group should be good enough.

usermod -a -G video qemu

It is not sufficient. I still have an issue opening the EGL render.

Error starting domain: internal error: process exited while connecting to monitor: 2017-04-06T09:19:07.265013Z qemu-system-x86_64: egl: no drm render node available

2017-04-06T09:19:07.265115Z qemu-system-x86_64: Failed to initialize EGL render node for SPICE GL

The original problem is described in Red Hat Bugzilla 1337290. Reading information from Red Hat Bugzilla 1337333, which I reached from Red Hat Bugzilla 1364075, referred in Red Hat Bugzilla 1337290. The message above is after a host reboot.

Trying to follow the suggestion of this comment and adding the following to /etc/libvirt/qemu.conf:

cgroup_device_acl = [
    "/dev/null", "/dev/full", "/dev/zero",
    "/dev/random", "/dev/urandom",
    "/dev/ptmx", "/dev/kvm", "/dev/kqemu",
    "/dev/rtc","/dev/hpet", "/dev/vfio/vfio",
    "/dev/dri/renderD128"
]

Now guest is booting. So it looks like the current state of Fedora25 is not good enough yet, you have to do the manual adjustments above.

The guest now sees the accelerated 3D interface:

[ddd@f25-shuttle ~]$ dmesg | grep drm
[   22.531176] [drm] Initialized
[   22.938684] [drm] pci: virtio-vga detected at 0000:00:02.0
[   22.939198] [drm] virgl 3d acceleration enabled
[   22.939972] [drm] virtio vbuffers: 288 bufs, 192B each, 54kB total.
[   22.940246] [drm] number of scanouts: 1
[   22.940255] [drm] number of cap sets: 1
[   22.947790] [drm] cap set 0: id 1, max-version 1, max-size 308
[   22.969754] virtio_gpu virtio0: fb0: virtiodrmfb frame buffer device
[   22.975394] [drm] Initialized virtio_gpu 0.0.1 0 on minor 0

But when I run glxgears -info, I get a black window and the following messages:

libGL error: MESA-LOADER: failed to retrieve device information
MESA-LOADER: failed to retrieve device information
MESA-LOADER: failed to retrieve device information
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
GL_RENDERER   = Gallium 0.4 on virgl
GL_VERSION    = 3.0 Mesa 13.0.4
GL_VENDOR     = Red Hat
VisualID 440, 0x1b8

If instead of Wayland, I use GNOME on Xorg, I get a Terminal window that is not even legible:

I guess this is Red Hat Bugzilla 1426549. Will build a new Mesa.

Being able to ssh directly into my guest is so cool. I love my new DNS setup.

Rebuilding Mesa

Configuration for Mesa from my tips:

./configure --build=i686-redhat-linux-gnu --host=i686-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --disable-asm --enable-selinux --enable-osmesa --with-dri-driverdir=/usr/lib/dri --enable-egl --disable-gles1 --enable-gles2 --disable-xvmc --with-egl-platforms=x11,drm --enable-shared-glapi --enable-gbm --disable-opencl --enable-glx-tls --enable-texture-float=yes --enable-gallium-llvm --with-llvm-shared-libs --enable-dri --enable-xa --with-gallium-drivers=svga,radeonsi,swrast,r600,r300,nouveau,virgl --with-dri-drivers=nouveau,radeon,r200,i915,i965

The option --with-llvm-shared-libs seems to be gone now.

  CXX      common/common_libamd_common_la-ac_llvm_helper.lo

g++: error: /usr/lib/rpm/redhat/redhat-hardened-cc1: No such file or directory Makefile:1058: recipe for target 'common/common_libamd_common_la-ac_llvm_helper.lo' failed

This specific error is fixed by installing redhat-rpm-config:

dnf install redhat-rpm-config

Hmmm, that did not work. I still have the issue after installing the new Mesa. I apparently installed it incorrectly:

GL_RENDERER   = Gallium 0.4 on virgl
GL_VERSION    = 3.0 Mesa 13.0.4
GL_VENDOR     = Red Hat

Version should be something like 17 now. Looking at what I did over one month ago (already). It looks like the configuration line is different:

./configure --prefix=/usr --enable-libglvnd --enable-selinux --enable-gallium-osmesa --with-dri-driverdir=/usr/lib64/dri --enable-gl --disable-gles1 --enable-gles2 --disable-xvmc --with-egl-platforms=drm,x11,surfaceless,wayland --enable-shared-glapi --enable-gbm --enable-glx-tls --enable-texture-float=yes --enable-gallium-llvm --enable-llvm-shared-libs --enable-dri --with-gallium-drivers=i915,nouveau,r300,svga,swrast,virgl --with-dri-drivers=swrast,nouveau

One important difference is where the libraries are installed. Updating my tips again. But installing Mesa with this command line results in a black screen. I can still access the console with Control-Alt F3 (I think this is F3).

Trying to login to enable ssh access even when I'm not logged in under Gnome. Ooops, Shuttle hung. Darn.

After rebooting the host, when I boot the guest, I now have some graphics during boot. So there is something dark and nasty lurking in the shadows, ready to pounce on the unsuspecting VM user. But then, the last entry I see is Started GNOME Display Manager. and nothing else.

Activating ssh daemon on the Shuttle guest.

Returning to the version that was active when I had this working, c0e9e61c9a1. But remembering that it was on Big, which has a different GPU (Nvidia GeForce GTX-760).

Not better. Gnome crashes:

[    4.439155] gnome-session-f[1156]: segfault at 0 ip 00007f17b5ced4e9 sp 00007ffe543caf10 error 4 in libgtk-3.so.0.2200.10[7f17b5a0e000+6f9000]

And of course, I had forgotten to snapshot the VM before doing that. Attempting to fix with

dnf reinstall dnf reinstall $(dnf search xorg | awk '{ print $1; }' | grep xorg)

That worked. Cool !

However, as soon as I reestablish acceleration in the guest, I get something rather spectacular in terms of output:

Trying again with

dnf reinstall $(dnf search mesa | awk '{ print $1; }' | grep mesa)

Still some garbage on the output. A different kind of garbage, rather less pleasing to the eye.

Trying the same setup on Muse, where it's a GTX-580, older Nividia card. Here, we have a different kind of art:

It looks like the problem there is somethine else, like a wrong stride when copying the buffer out. I can log in by "guessing" what each part is based on things like mouse movements, etc.

Uh oh, now Muse locked up. So maybe the last lock up on Shuttle was a software issue and not hardware, which in a sense is rather reassuring, given how much time I spent trying to stabilize that system.

Giving up for tonight.