Happy Birthday Tanguy

Thursday, March 30, 2017

Today is Tanguy's birthday. I doubt he will ever read my blog, but hey!

Of Mice and Men

I'm not executing my plan at all. I'm always derailed by some nasty stuff.

Change of plans: Big switches back to Windows

After switching disks and copying gigabytes of VM files around, I got Big back up and running. However, I decided to switch it back to Windows. There are several reasons:

  1. This is a personal machine, with an HTC Vive connected to it. My kids (and myself) would love having this HTC Vive available. So that means returning it to Windows. Given how hard it is to setup the HTC Vive even under Windows, I doubt I can run it under KVM for now. (Future challenge? )
  2. I had setup several Boxes virtual machines in this system, including with stuff that is now a bit hard to find, e.g. MinGW for Windows XP. It's faster for me to get all these VMs than to re-create them under KVM.
  3. It's a good comparison point for VM performance & VM features (Virtual Boxes vs. KVM/QEMU)
  4. It's a good comparison point for remote desktop performance and features (Microsoft Remote Desktop vs. Spice vs. VNC).
  5. I will have one full-time Windows host, which was missing in my setup so far.

DNS server goes away from Synology

The Synology server has some issues with I/O overload. In that case, the DNS server no longer responds in a timely manner, which messes up everything in my setup.

In the meantime, I have been copying VM images to Big at the astounding rate of 1M/s. This is over a 1Gb Ethernet network, and with 4 disks in parallel. Just a ssh to the machine is slow. According to iotop, there is about 40Mb/s being read continously, and it looks like IOPS are off the chart. But I can't find where it comes from. Another forced reboot lurking.

After rebooting, copies happen at over 25Mb/s. Again, there is really something wrong with the Synology, it degrades over time.

Putting DNS and DHCP server on a Raspberry Pi

These are time-critical services, I can't afford to have DNS timeouts that throw away my whole automation infrastructure just because the Synology is doing some file copy!

Dedicating a Raspberry Pi to be a DNS server, following instructions found here.

Automating the transfer of DHCP requests to DNS following these instructions (in French, sorry).

Automating the update of DNS entries from the Raspberry Pi's /etc/hosts using this little Perl script, with some modifications to deal with updates from the DHCP server.

Wow! It's amazing, now my network feels snappy again. No longer these second-long delays that sometimes occured when the Synology was the DNS server and it was soooo busy doing something else. Plus the time it takes to reboot the Pi is really short, and I have full control over what's going on in it.

Saved the DNS configuration files in Work/notes/dns-config.tgz in case the Raspberry card goes in flames.

Updating after /etc/hosts:

hosts2dns -update
service isc-dhcp-server restart
service bind9 restart

Identifying bad cables with a router

One thing I noticed was that the Bond between the Synology and the router was not showing 1Gb Ethernet for both links. One of the cables was some low quality stuff, only did 100Mb/s. Used it for the Raspberry pi, which can't go higher anyway. And used that trick to test my cables for quality. Switched another cable out.

Remote access (VNC)

Remote access was broken on Shuttle too after I did a dnf update. Same symptoms I had yesterday: I can do a remote-viewer vnc://localhost:5900 and it works OK, but doing it from another machine fails.

I thought dnf history undo would save me, but alas:

No package kernel-0:4.8.15-300.fc25.x86_64 available.

Error: An operation cannot be undone

.

On muse, the undo fails too, but for a different package:

No package virt-manager-0:1.4.1-1.fc25.noarch available.

Error: An operation cannot be undone

x

So I need to figure out what the problem is by myself. Filed Red Hat Bugzilla 1437619.

Took me several hours of fiddling, but I found a workaround, which is to force vino-server to listen to a specific port. This came from this Ubuntu bug report, see also Red Hat Bugzilla 703009 (marked as "Won't fix" in Fedora25).

So this brings the question: why did it work before, and stopped working now? Two hypotheses:

  1. My older DNS server was somehow giving Screen Sharing an IPv6 or a way to switch to IPv6. Frankly, I don't see how, I never placed any AAAA record in it.
  2. There is a regression in the configuration of Gnome. This can't be vino-server itself, it was not updated on the machines.

Hmmm. There is a comment in Red Hat Bugzilla 703009 that makes me wonder if the problem could have something to do with the resolution of localhost.

I was able to work around the problem by doing the following - I commented out the ipv6 localhost line in /etc/hosts:


#::1 MYHOST.OURDOMAIN.COM MYHOST localhost6.localdomain6 localhost6

then,

  1. pkill vino-server

... and now it works.

$ lsof -i -P | grep -i "listen" vino-serv 3232 needrealname 18u IPv6 36739542 0t0 TCP *:5900 (LISTEN) vino-serv 3232 needrealname 19r IPv4 36739543 0t0 TCP *:5900 (LISTEN)

Looking at the configuraiion for localhost in /etc/bind/db.local, I see that there is the following:

@       IN      SOA     localhost. root.localhost. (
                              2         ; Serial
                         604800         ; Refresh
                          86400         ; Retry
                        2419200         ; Expire
                         604800 )       ; Negative Cache TTL
;

@ IN NS localhost. @ IN A 127.0.0.1 @ IN AAAA ::1

So it is indeed creating an IPv6 localhost. Apparenlty, this is sufficient to throw off vino-server. If I remove this entry, I can access my server again:

vino-serv 1845    ddd   12u  IPv6  33997      0t0  TCP *:5900 (LISTEN)

vino-serv 1845 ddd 13u IPv4 33998 0t0 TCP *:5900 (LISTEN) vino-serv 1845 ddd 15u IPv4 35300 0t0 TCP shuttle.dinechin.org:5900->192.168.77.22:53614 (ESTABLISHED)

Problem solved. Took only 5 hours