Bisecting Mesa

Thursday, March 2, 2017

Tao3D ran all night.

I want to identify the bad commit. I know 13.0.3 is bad, I know master is good. So it’s a simple case of git bisect, right?

Things are never so simple. Apparently, git bisect gets stuck because of a merge base being marked as “bad”. Looking things up on the web, it’s because… git bisect can only find the first bad commit, not the first good one. So doing it backwards, where “good” means “bad” and “bad” means “good”.

Result is surprising, it looks like this commit fixes things:

commit dc2d9b8da14b97376cca9c87e2408763b11bdaa7

Author: Marc-André Lureau Date: Thu Feb 9 18:41:11 2017 +0400

tgsi-dump: dump label if instruction has one The instruction has an associated label when Instruction.Label == 1, as can be seen in ureg_emit_label() or tgsi_build_full_instruction(). This fixes dump generating extra :0 labels on conditionals, and virgl parsing more than the expected tokens and eventually reaching "Illegal command buffer" (when parsing more than a safety margin of 10 we currently have). Signed-off-by: Marc-André Lureau Cc: "13.0 17.0" Signed-off-by: Dave Airlie

Marc-André had pointed me to Red Hat Bugzilla 1417932. But the symptoms seemed to be somewhat different, because I had garbage on screen instead of black windows, and also some MESA-LOADER messages not referenced in the original bug report.

Marked Red Hat Bugzilla 1426549 as a duplicate of Red Hat Bugzilla 1417932.

Reinstalling Muse

Reinstalled Fedora 25 workstation on Muse. I have two possibilities for the BTRFS disaster:

  1. Either BTRFS screwed up badly, it’s a software fault, and I should not use BTRFS again,
  2. Or BTRFS reported a bad disk (which is somewhat consistent with the sector reallocation messages I saw in the kernel), it’s a hardware fault, and I should keep using BTRFS to detect if other sectors go bad.

I chose option 1, but I really don’t have much info beyond to make an informed decision, beyond information about the disk.

Spice memory allocation

Decided to take a look at spice memory allocation. Right now, there are a number of places where we statically allocate 10000 entries (NUM_SURFACES really). So we need approximately 250K for DCC private areas, when in most cases, we use one surface. That seems like a waste.

I had started this on Muse before the disk crashed, and that was part of the data that I recovered.

Linux backups

Setting up backups for my Linux machines now 😉 I’d like to do that with BTRFS snapshot replications to the server, if at all possible. OK, I know, I’m taking some chances here 😉 The latest Synology update finally brings the Snapshot Replication application to the DSM412+. So I think a simple rsync script to the DSM, with Snapshot Replication configured on the server is good enough.

This also allows me to fine-tune the backup of VMs (large files, infrequent backup) with the backup of my source editing directories (small files, frequent).