Against the wind

Thursday, June 8, 2017

Well, some days are not as good as others.

macOS Spice

Yet another round for the macOS-enablement patch series. This time, it was panned harder than the previous ones. I sense that this is getting on people's nerves. That will teach me to take into account every single comment.

I can understand that it's irritating for people to have to ack the same patch multiple times. But if one person acks, and other make comments that seem valid, I prefer to respin.

But what is annoying me here is that one reason for respinning several times was to add a configure option to make sure that, by default, alignment-widening casts with SPICE_ALIGNED_CAST are not causing additional cost for testing alignment. Fine. I added the --enable-alignment-checksconfigure option, which is like a huge hammer to swat a fly.

But now, if the comments are that there should be no runtime cost even with --enable-alignment-checks, I give up. I want the instrumentation to tell me where we do stupidly misaligned things. Maybe this is the force of habit with PA-RISC or Itanium or older ARMs or basically anything but x86, where things had to be aligned or you'd start to pay dearly, at least performance-wise.

Worse, the code in master keeps introducing misaligned casts. So each resping grows a tiny bit larger.

Tracing options

Clearly, the discussion on tracing options yesterday was not going the way I wanted. There was no discussion of the categories we want to have in the code, no discussion of how to document tracing options. It was rather a collection of suggestion on how to use tools that really did not fit the bill at all for me. Oh well.

So I spent most of the day working on cleaning up debug traces, so that I could build a more convincing case. I did most of the boring work of converting existing debug output messages, splitting them into categories that seemed to make sense to me.

In the end, I had a version of Spicy that allows me to do a number of new things that I want to do frequently to collect information.

Statistics

Why do we drop so many frames?

% spicy -t drop_stats=100 -h turbo -p 5901
[drop_stats]id=0 #in-frames=100 #late=100 (100.00%, avg 1639.24 ms) #drop-on-playback=8 (8.00%)
[drop_stats]id=0 #in-frames=200 #late=200 (100.00%, avg 1603.13 ms) #drop-on-playback=8 (4.00%)
[drop_stats]id=0 #in-frames=300 #late=300 (100.00%, avg 1607.24 ms) #drop-on-playback=20 (6.67%)
[drop_stats]id=0 #in-frames=400 #late=400 (100.00%, avg 1597.48 ms) #drop-on-playback=20 (5.00%)
[drop_stats]id=0 #in-frames=500 #late=500 (100.00%, avg 1591.58 ms) #drop-on-playback=20 (4.00%)
[drop_stats]id=0 #in-frames=600 #late=600 (100.00%, avg 1587.52 ms) #drop-on-playback=20 (3.33%)
[drop_stats]id=0 #in-frames=700 #late=700 (100.00%, avg 1584.60 ms) #drop-on-playback=20 (2.86%)

This illustrates why the tweaks are an integral part of the mechanism, not just something entirely different that I shoehorn in.

When I lifted this instrumentation from destroy_display_stream, I noticed that the code there gives me a negative num_out_frames. It looks like the computation is invalid, the code is stale, or nobody really bothered checking it recently. So I changed it in the above listing.

Time-stamped logs

Follow up to previous question: does what is being reported make sense if I compare that to actual timing? (I’d say no…)

% spicy -t drop_stats=100:trace_timestamp -h turbo -p 5901

583120.963409:[drop_stats]id=0 #in-frames=100 #late=100 (100.00%, avg 1643.21 ms) #drop-on-playback=10 (10.00%)
1.425560:[drop_stats]id=0 #in-frames=200 #late=200 (100.00%, avg 1605.57 ms) #drop-on-playback=10 (5.00%)
2.853841:[drop_stats]id=0 #in-frames=300 #late=300 (100.00%, avg 1592.95 ms) #drop-on-playback=10 (3.33%)
4.270006:[drop_stats]id=0 #in-frames=400 #late=400 (100.00%, avg 1587.17 ms) #drop-on-playback=10 (2.50%)
5.679986:[drop_stats]id=0 #in-frames=500 #late=500 (100.00%, avg 1583.45 ms) #drop-on-playback=10 (2.00%)

The fact that the first time is absolute is not a bug, it’s in case you need the time of day when this happened, although this could probably be shown in a more friendly way.

What I don’t understand is how we can have 1.6 seconds average delay when the program ran for only 1.42 second… So I guess something is wrong in the way we exchange the initial multimedia times. I added some traces for related to mmtime, but I did not exploit them yet. To be investigated.

Wire-level info

What is actually received during the early protocol exchange? That’s mostly for my curiosity, to see how chatty the protocol is, the size of packets it tends to exchange, etc. This example and the next are tools that would immediately have shown me where the macOS version of spicy was goofing up talking to the server.

% spicy -t wire_read -h turbo -p 5901
[wire_read]Read 16 bytes in 0x7f8b150d85a8
[wire_read]00000000:  52 45 44 51 02 00 00 00 02 00 00 00 BA 00 00 00

[wire_read]Read 186 bytes in 0x7f8b141c6650
[wire_read]00000000:  00 00 00 00 30 81 9F 30 0D 06 09 2A 86 48 86 F7
[wire_read]00000010:  0D 01 01 01 05 00 03 81 8D 00 30 81 89 02 81 81
[wire_read]00000020:  00 D2 A9 FE 7B F0 36 B7 33 3C C1 F0 5E 10 50 F3
[wire_read]00000030:  11 D3 3E 31 4A 95 AC AC 7B 0C 20 CB 57 B2 C8 1D
[wire_read]00000040:  10 F1 E2 4A 1A F2 8F 2A 03 70 25 BB 63 C7 D6 51
[wire_read]00000050:  D4 DA 64 3C 95 0C B6 9E A7 4E B3 3F 65 12 83 86
[wire_read]00000060:  F4 35 F3 AC 3D C3 9A 6C C9 EE 34 D9 67 F0 8A 2D
[wire_read]00000070:  53 4D 73 46 06 CA 80 04 CF 7F 72 4A 8D 74 81 78
[wire_read]00000080:  E8 28 39 51 7C 9E 3B C2 19 D6 BD EC B4 F7 0A 68
[wire_read]00000090:  D3 AF D7 1F E6 93 E1 4C 23 F5 83 68 E1 7D 31 9A
[wire_read]000000A0:  53 02 03 01 00 01 01 00 00 00 01 00 00 00 B2 00
[wire_read]000000B0:  00 00 0B 00 00 00 09 00 00 00
[wire_read]Read 4 bytes in 0x109478b44
[wire_read]00000000:  00 00 00 00
** Message: main channel: opened
[wire_read]Read 6 bytes in 0x7f8b1274efb0
[wire_read]00000000:  67 00 20 00 00 00
[wire_read]Read 32 bytes in 0x7f8b1274dff0
[wire_read]00000000:  31 AE 1E 5E 01 00 00 00 03 00 00 00 02 00 00 00
[wire_read]00000010:  01 00 00 00 0A 00 00 00 9B AF BE 0B 00 00 00 01

Symbolic dumps

Hex dumps look nice, but practically limited. A symbolic view of the initial exchange (not all cases are covered in marshal.py / demarshal.py yet, so some fields are likely missing).

% spicy -t marshall_read -h turbo -p 5901

[marshall_read]  SpiceMsgMainInit.session_id=1745897490
[marshall_read]  SpiceMsgMainInit.display_channels_hint=1
[marshall_read]  SpiceMsgMainInit.supported_mouse_modes=3
[marshall_read]  SpiceMsgMainInit.current_mouse_mode=2
[marshall_read]  SpiceMsgMainInit.agent_connected=1
[marshall_read]  SpiceMsgMainInit.agent_tokens=10
[marshall_read]  SpiceMsgMainInit.multi_media_time=226863047
[marshall_read]  SpiceMsgMainInit.ram_hint=16777216
[marshall_read]  SpiceMsgMainName.name_len=10
[marshall_read]  SpiceMsgPing.id=1
[marshall_read]  SpiceMsgPing.timestamp=226863447500
[marshall_read]  SpiceMsgPing.id=2
[marshall_read]  SpiceMsgPing.timestamp=226863447502
[marshall_read]  SpiceMsgPing.id=3
[marshall_read]  SpiceMsgPing.timestamp=226863447503
[marshall_read]  SpiceMsgChannels.num_of_channels=8
[marshall_read]  SpiceChannelId.type=2
[marshall_read]  SpiceChannelId.id=0
[marshall_read]  SpiceChannelId.type=4
[marshall_read]  SpiceChannelId.id=0
[marshall_read]  SpiceChannelId.type=9
[marshall_read]  SpiceChannelId.id=1
[marshall_read]  SpiceChannelId.type=9
[marshall_read]  SpiceChannelId.id=0
[marshall_read]  SpiceChannelId.type=6
[marshall_read]  SpiceChannelId.id=0
[marshall_read]  SpiceChannelId.type=5
[marshall_read]  SpiceChannelId.id=0
[marshall_read]  SpiceChannelId.type=2
[marshall_read]  SpiceChannelId.id=1
[marshall_read]  SpiceChannelId.type=3
[marshall_read]  SpiceChannelId.id=0

I believe this example illustrates well the kind of instrumentation that is useful and convenient to have there all the time, but that we don’t want to send to a debug log if not explicitly requested. I see no reason to pollute even a structured log with that stuff, unless I actually need it. One line per field with a common structure on purpose, to make it easy to grep / awk the data you want.

Real-time info

The new tracing options let me show things that are to chatty for normal use, like mouse moves. I think that something similar to this may be useful in understanding the recent caps-lock bug.

% spicy -t mouse_move:trace_timestamp -h turbo -p 5901

0.000035:[mouse_move]inputs-3:0: pos=505,299 delta=0,0 btn=0 dpy=0 cnt=0 mod=0x0 lck=0x0
0.024286:[mouse_move]inputs-3:0: pos=482,257 delta=0,0 btn=0 dpy=0 cnt=1 mod=0 lck=0
0.024299:[mouse_move]inputs-3:0: pos=482,257 delta=0,0 btn=0 dpy=0 cnt=1 mod=0x0 lck=0x0
0.032647:[mouse_move]inputs-3:0: pos=439,182 delta=0,0 btn=0 dpy=0 cnt=2 mod=0 lck=0
0.032658:[mouse_move]inputs-3:0: pos=439,182 delta=0,0 btn=0 dpy=0 cnt=2 mod=0x0 lck=0x0
0.068790:[mouse_move]inputs-3:0: pos=401,129 delta=0,0 btn=0 dpy=0 cnt=3 mod=0 lck=0
0.068804:[mouse_move]inputs-3:0: pos=401,129 delta=0,0 btn=0 dpy=0 cnt=3 mod=0x0 lck=0x0
0.097703:[mouse_move]inputs-3:0: pos=392,115 delta=0,0 btn=0 dpy=0 cnt=0 mod=0 lck=0
0.097716:[mouse_move]inputs-3:0: pos=392,115 delta=0,0 btn=0 dpy=0 cnt=0 mod=0x0 lck=0x0
0.118830:[mouse_move]inputs-3:0: pos=379,95 delta=0,0 btn=0 dpy=0 cnt=1 mod=0 lck=0
0.118842:[mouse_move]inputs-3:0: pos=379,95 delta=0,0 btn=0 dpy=0 cnt=1 mod=0x0

Compatibility

There are tracing options to use the standard glib log in addition to stdout. Trace name are visible as Spice[x] domains).

Note that after exploring the glib logging code, I am relatively concerned about the performance if there are very large numbers of domains, since they seem to be doing linear searches to find the domain each time you log something.

Back to the tracing options, you can of course set trace_stdout=0 if you only want the glib log. Permanently in SPICE_TRACES in your .bashrc if that’s your preference.

% G_MESSAGES_DEBUG=all spicy -t mouse_move:trace_stdlog -h turbo -p 5901

[mouse_move]inputs-3:0: pos=330,166 delta=0,0 btn=0 dpy=0 cnt=0 mod=0 lck=0
(spicy:57464): Spice[mouse_move]-DEBUG: channel-inputs.c:346:spice_inputs_position: inputs-3:0: pos=330,166 delta=0,0 btn=0 dpy=0 cnt=0 mod=0 lck=0
[mouse_move]inputs-3:0: pos=330,166 delta=0,0 btn=0 dpy=0 cnt=0 mod=0x0 lck=0x0
(spicy:57464): Spice[mouse_move]-DEBUG: channel-inputs.c:188:mouse_position: inputs-3:0: pos=330,166 delta=0,0 btn=0 dpy=0 cnt=0 mod=0x0 lck=0x0
[mouse_move]inputs-3:0: pos=322,155 delta=0,0 btn=0 dpy=0 cnt=1 mod=0 lck=0
(spicy:57464): Spice[mouse_move]-DEBUG: channel-inputs.c:346:spice_inputs_position: inputs-3:0: pos=322,155 delta=0,0 btn=0 dpy=0 cnt=1 mod=0 lck=0
[mouse_move]inputs-3:0: pos=322,155 delta=0,0 btn=0 dpy=0 cnt=1 mod=0x0 lck=0x0
(spicy:57464): Spice[mouse_move]-DEBUG: channel-inputs.c:188:mouse_position: inputs-3:0: pos=322,155 delta=0,0 btn=0 dpy=0 cnt=1 mod=0x0 lck=0x0
[mouse_move]inputs-3:0: pos=308,139 delta=0,0 btn=0 dpy=0 cnt=2 mod=0 lck=0
(spicy:57464): Spice[mouse_move]-DEBUG: channel-inputs.c:346:spice_inputs_position: inputs-3:0: pos=308,139 delta=0,0 btn=0 dpy=0 cnt=2 mod=0 lck=0
[mouse_move]inputs-3:0: pos=308,139 delta=0,0 btn=0 dpy=0 cnt=2 mod=0x0 lck=0x0

This is also a good way to compare how "chatty" the old glib-based log is relative to the shorter default format for spice_trace.

Self-documented

To get the list of the current traces, you can use the -t help or -t list option.

The current list is basically the result of looking at the various CHANNEL_DEBUG statements and splitting them into subcategories, plus a few additional instrumentations I had put in place over time but, until now, had to cherry pick every time I wanted it.

List of available traces
            canvas: Canvas-related information
      capabilities: Channel capabilities
           channel: Channel debug (CHANNEL_DEBUG macro)
   channel_connect: Channel connection / disconnection
    channel_cursor: Channel cursor information
    channel_notify: Channel notifications
      channel_read: Reading data from channels
      channel_wait: Channel waits
     channel_write: Writing data to channels
         clipboard: Clipboard-related information
             debug: General debug (SPICE_DEBUG macro)
           display: Display-related information
        drop_stats: Show drop statistics every N input frame
              init: Initialization
                gl: OpenGL-related information
     marshall_read: De-marshalling data for reading
    marshall_write: Marshalling data for writing
         migration: Migration events
            mmtime: Multimedia time measurements
          monitors: Monitor-related information
             mouse: Mouse-related information, except moves
        mouse_move: Individual mouse movements (chatty)
              port: Channel port information
              rect: Rectangles debug
         smartcard: Smartcard data
             sound: Sound playback
        sound_data: Sound data hex dump
        sound_mode: Sound mode hex dump
               ssl: SSL verification
              sasl: SASL verification
           surface: Surface-related information
    trace_function: Show function in trace output
    trace_location: Show code location in trace output
        trace_name: Show trace names in trace output
         trace_set: Show which traces are being set
      trace_stdlog: Send traces to glib logging facility
      trace_stdout: Send traces to standard output
   trace_timestamp: Show time stamp in trace output
               usb: USB redirection
               tls: TLS negociations
            webdav: WebDAV (directory redirection)
       webdav_data: WebDAV hex dump
         wire_read: Hex dump of data read on the wire
        wire_write: Hex dump of data read on the wire

Performance

One of the things that concerns me about the current infrastructure is that it's relatively costly. Basically, Spice logs everything, taking the time to vsprintf-format every single message. Then, after carefully spending the time to figure out the proper sequence of characters, we send it to glib, which just as carefully discards it in most cases.

The cost is not entirely negligible. Below is a small benchmark I did comparing the various solutions. All numbers in seconds for 1M iterations, showing a counter, best of 3 runs on a non-loaded machine.

	`spice_trace()`	`spice_log()`
Disabled trace	0.0021	1.26
Trace to console	2.11	8.07
Grep, filtering 20 entries	0.36	4.37
Sending to file in /tmp	0.31 (29M file)	4.31 (79M file)

The bottom line is that this approach beats the crap out of the existing code, by a factor of 500+ for non-used trace code (I also tried g_debug, it’s somewhere in the middle, closer to spice_log).