Thursday, June 8, 2017
Well, some days are not as good as others.
macOS Spice
Yet another round for the macOS-enablement patch series. This time, it was panned harder than the previous ones. I sense that this is getting on people's nerves. That will teach me to take into account every single comment.
I can understand that it's irritating for people to have to ack the same patch multiple times. But if one person acks, and other make comments that seem valid, I prefer to respin.
But what is annoying me here is that one reason for respinning several times was to add a configure option to make sure that, by default, alignment-widening casts with SPICE_ALIGNED_CAST are not causing additional cost for testing alignment. Fine. I added the --enable-alignment-checksconfigure option, which is like a huge hammer to swat a fly.
But now, if the comments are that there should be no runtime cost even with --enable-alignment-checks, I give up. I want the instrumentation to tell me where we do stupidly misaligned things. Maybe this is the force of habit with PA-RISC or Itanium or older ARMs or basically anything but x86, where things had to be aligned or you'd start to pay dearly, at least performance-wise.
Worse, the code in master keeps introducing misaligned casts. So each resping grows a tiny bit larger.
Tracing options
Clearly, the discussion on tracing options yesterday was not going the way I wanted. There was no discussion of the categories we want to have in the code, no discussion of how to document tracing options. It was rather a collection of suggestion on how to use tools that really did not fit the bill at all for me. Oh well.
So I spent most of the day working on cleaning up debug traces, so that I could build a more convincing case. I did most of the boring work of converting existing debug output messages, splitting them into categories that seemed to make sense to me.
In the end, I had a version of Spicy that allows me to do a number of new things that I want to do frequently to collect information.
Statistics
Why do we drop so many frames?
% spicy -t drop_stats=100 -h turbo -p 5901[drop_stats]id=0 #in-frames=100 #late=100 (100.00%, avg 1639.24 ms) #drop-on-playback=8 (8.00%) [drop_stats]id=0 #in-frames=200 #late=200 (100.00%, avg 1603.13 ms) #drop-on-playback=8 (4.00%) [drop_stats]id=0 #in-frames=300 #late=300 (100.00%, avg 1607.24 ms) #drop-on-playback=20 (6.67%) [drop_stats]id=0 #in-frames=400 #late=400 (100.00%, avg 1597.48 ms) #drop-on-playback=20 (5.00%) [drop_stats]id=0 #in-frames=500 #late=500 (100.00%, avg 1591.58 ms) #drop-on-playback=20 (4.00%) [drop_stats]id=0 #in-frames=600 #late=600 (100.00%, avg 1587.52 ms) #drop-on-playback=20 (3.33%) [drop_stats]id=0 #in-frames=700 #late=700 (100.00%, avg 1584.60 ms) #drop-on-playback=20 (2.86%)
This illustrates why the tweaks are an integral part of the mechanism, not just something entirely different that I shoehorn in.
When I lifted this instrumentation from destroy_display_stream, I noticed that the code there gives me a negative num_out_frames. It looks like the computation is invalid, the code is stale, or nobody really bothered checking it recently. So I changed it in the above listing.
Time-stamped logs
Follow up to previous question: does what is being reported make sense if I compare that to actual timing? (I’d say no…)
% spicy -t drop_stats=100:trace_timestamp -h turbo -p 5901583120.963409:[drop_stats]id=0 #in-frames=100 #late=100 (100.00%, avg 1643.21 ms) #drop-on-playback=10 (10.00%) 1.425560:[drop_stats]id=0 #in-frames=200 #late=200 (100.00%, avg 1605.57 ms) #drop-on-playback=10 (5.00%) 2.853841:[drop_stats]id=0 #in-frames=300 #late=300 (100.00%, avg 1592.95 ms) #drop-on-playback=10 (3.33%) 4.270006:[drop_stats]id=0 #in-frames=400 #late=400 (100.00%, avg 1587.17 ms) #drop-on-playback=10 (2.50%) 5.679986:[drop_stats]id=0 #in-frames=500 #late=500 (100.00%, avg 1583.45 ms) #drop-on-playback=10 (2.00%)
The fact that the first time is absolute is not a bug, it’s in case you need the time of day when this happened, although this could probably be shown in a more friendly way.
What I don’t understand is how we can have 1.6 seconds average delay when the program ran for only 1.42 second… So I guess something is wrong in the way we exchange the initial multimedia times. I added some traces for related to mmtime, but I did not exploit them yet. To be investigated.
Wire-level info
What is actually received during the early protocol exchange? That’s mostly for my curiosity, to see how chatty the protocol is, the size of packets it tends to exchange, etc. This example and the next are tools that would immediately have shown me where the macOS version of spicy was goofing up talking to the server.
% spicy -t wire_read -h turbo -p 5901[wire_read]Read 16 bytes in 0x7f8b150d85a8 [wire_read]00000000: 52 45 44 51 02 00 00 00 02 00 00 00 BA 00 00 00
[wire_read]Read 186 bytes in 0x7f8b141c6650 [wire_read]00000000: 00 00 00 00 30 81 9F 30 0D 06 09 2A 86 48 86 F7 [wire_read]00000010: 0D 01 01 01 05 00 03 81 8D 00 30 81 89 02 81 81 [wire_read]00000020: 00 D2 A9 FE 7B F0 36 B7 33 3C C1 F0 5E 10 50 F3 [wire_read]00000030: 11 D3 3E 31 4A 95 AC AC 7B 0C 20 CB 57 B2 C8 1D [wire_read]00000040: 10 F1 E2 4A 1A F2 8F 2A 03 70 25 BB 63 C7 D6 51 [wire_read]00000050: D4 DA 64 3C 95 0C B6 9E A7 4E B3 3F 65 12 83 86 [wire_read]00000060: F4 35 F3 AC 3D C3 9A 6C C9 EE 34 D9 67 F0 8A 2D [wire_read]00000070: 53 4D 73 46 06 CA 80 04 CF 7F 72 4A 8D 74 81 78 [wire_read]00000080: E8 28 39 51 7C 9E 3B C2 19 D6 BD EC B4 F7 0A 68 [wire_read]00000090: D3 AF D7 1F E6 93 E1 4C 23 F5 83 68 E1 7D 31 9A [wire_read]000000A0: 53 02 03 01 00 01 01 00 00 00 01 00 00 00 B2 00 [wire_read]000000B0: 00 00 0B 00 00 00 09 00 00 00 [wire_read]Read 4 bytes in 0x109478b44 [wire_read]00000000: 00 00 00 00 ** Message: main channel: opened [wire_read]Read 6 bytes in 0x7f8b1274efb0 [wire_read]00000000: 67 00 20 00 00 00 [wire_read]Read 32 bytes in 0x7f8b1274dff0 [wire_read]00000000: 31 AE 1E 5E 01 00 00 00 03 00 00 00 02 00 00 00 [wire_read]00000010: 01 00 00 00 0A 00 00 00 9B AF BE 0B 00 00 00 01
Symbolic dumps
Hex dumps look nice, but practically limited. A symbolic view of the initial exchange (not all cases are covered in marshal.py / demarshal.py yet, so some fields are likely missing).
% spicy -t marshall_read -h turbo -p 5901[marshall_read] SpiceMsgMainInit.session_id=1745897490 [marshall_read] SpiceMsgMainInit.display_channels_hint=1 [marshall_read] SpiceMsgMainInit.supported_mouse_modes=3 [marshall_read] SpiceMsgMainInit.current_mouse_mode=2 [marshall_read] SpiceMsgMainInit.agent_connected=1 [marshall_read] SpiceMsgMainInit.agent_tokens=10 [marshall_read] SpiceMsgMainInit.multi_media_time=226863047 [marshall_read] SpiceMsgMainInit.ram_hint=16777216 [marshall_read] SpiceMsgMainName.name_len=10 [marshall_read] SpiceMsgPing.id=1 [marshall_read] SpiceMsgPing.timestamp=226863447500 [marshall_read] SpiceMsgPing.id=2 [marshall_read] SpiceMsgPing.timestamp=226863447502 [marshall_read] SpiceMsgPing.id=3 [marshall_read] SpiceMsgPing.timestamp=226863447503 [marshall_read] SpiceMsgChannels.num_of_channels=8 [marshall_read] SpiceChannelId.type=2 [marshall_read] SpiceChannelId.id=0 [marshall_read] SpiceChannelId.type=4 [marshall_read] SpiceChannelId.id=0 [marshall_read] SpiceChannelId.type=9 [marshall_read] SpiceChannelId.id=1 [marshall_read] SpiceChannelId.type=9 [marshall_read] SpiceChannelId.id=0 [marshall_read] SpiceChannelId.type=6 [marshall_read] SpiceChannelId.id=0 [marshall_read] SpiceChannelId.type=5 [marshall_read] SpiceChannelId.id=0 [marshall_read] SpiceChannelId.type=2 [marshall_read] SpiceChannelId.id=1 [marshall_read] SpiceChannelId.type=3 [marshall_read] SpiceChannelId.id=0
I believe this example illustrates well the kind of instrumentation that is useful and convenient to have there all the time, but that we don’t want to send to a debug log if not explicitly requested. I see no reason to pollute even a structured log with that stuff, unless I actually need it. One line per field with a common structure on purpose, to make it easy to grep / awk the data you want.
Real-time info
The new tracing options let me show things that are to chatty for normal use, like mouse moves. I think that something similar to this may be useful in understanding the recent caps-lock bug.
% spicy -t mouse_move:trace_timestamp -h turbo -p 59010.000035:[mouse_move]inputs-3:0: pos=505,299 delta=0,0 btn=0 dpy=0 cnt=0 mod=0x0 lck=0x0 0.024286:[mouse_move]inputs-3:0: pos=482,257 delta=0,0 btn=0 dpy=0 cnt=1 mod=0 lck=0 0.024299:[mouse_move]inputs-3:0: pos=482,257 delta=0,0 btn=0 dpy=0 cnt=1 mod=0x0 lck=0x0 0.032647:[mouse_move]inputs-3:0: pos=439,182 delta=0,0 btn=0 dpy=0 cnt=2 mod=0 lck=0 0.032658:[mouse_move]inputs-3:0: pos=439,182 delta=0,0 btn=0 dpy=0 cnt=2 mod=0x0 lck=0x0 0.068790:[mouse_move]inputs-3:0: pos=401,129 delta=0,0 btn=0 dpy=0 cnt=3 mod=0 lck=0 0.068804:[mouse_move]inputs-3:0: pos=401,129 delta=0,0 btn=0 dpy=0 cnt=3 mod=0x0 lck=0x0 0.097703:[mouse_move]inputs-3:0: pos=392,115 delta=0,0 btn=0 dpy=0 cnt=0 mod=0 lck=0 0.097716:[mouse_move]inputs-3:0: pos=392,115 delta=0,0 btn=0 dpy=0 cnt=0 mod=0x0 lck=0x0 0.118830:[mouse_move]inputs-3:0: pos=379,95 delta=0,0 btn=0 dpy=0 cnt=1 mod=0 lck=0 0.118842:[mouse_move]inputs-3:0: pos=379,95 delta=0,0 btn=0 dpy=0 cnt=1 mod=0x0
Compatibility
There are tracing options to use the standard glib log in addition to stdout. Trace name are visible as Spice[x] domains).
Note that after exploring the glib logging code, I am relatively concerned about the performance if there are very large numbers of domains, since they seem to be doing linear searches to find the domain each time you log something.
Back to the tracing options, you can of course set trace_stdout=0 if you only want the glib log. Permanently in SPICE_TRACES in your .bashrc if that’s your preference.
% G_MESSAGES_DEBUG=all spicy -t mouse_move:trace_stdlog -h turbo -p 5901[mouse_move]inputs-3:0: pos=330,166 delta=0,0 btn=0 dpy=0 cnt=0 mod=0 lck=0 (spicy:57464): Spice[mouse_move]-DEBUG: channel-inputs.c:346:spice_inputs_position: inputs-3:0: pos=330,166 delta=0,0 btn=0 dpy=0 cnt=0 mod=0 lck=0 [mouse_move]inputs-3:0: pos=330,166 delta=0,0 btn=0 dpy=0 cnt=0 mod=0x0 lck=0x0 (spicy:57464): Spice[mouse_move]-DEBUG: channel-inputs.c:188:mouse_position: inputs-3:0: pos=330,166 delta=0,0 btn=0 dpy=0 cnt=0 mod=0x0 lck=0x0 [mouse_move]inputs-3:0: pos=322,155 delta=0,0 btn=0 dpy=0 cnt=1 mod=0 lck=0 (spicy:57464): Spice[mouse_move]-DEBUG: channel-inputs.c:346:spice_inputs_position: inputs-3:0: pos=322,155 delta=0,0 btn=0 dpy=0 cnt=1 mod=0 lck=0 [mouse_move]inputs-3:0: pos=322,155 delta=0,0 btn=0 dpy=0 cnt=1 mod=0x0 lck=0x0 (spicy:57464): Spice[mouse_move]-DEBUG: channel-inputs.c:188:mouse_position: inputs-3:0: pos=322,155 delta=0,0 btn=0 dpy=0 cnt=1 mod=0x0 lck=0x0 [mouse_move]inputs-3:0: pos=308,139 delta=0,0 btn=0 dpy=0 cnt=2 mod=0 lck=0 (spicy:57464): Spice[mouse_move]-DEBUG: channel-inputs.c:346:spice_inputs_position: inputs-3:0: pos=308,139 delta=0,0 btn=0 dpy=0 cnt=2 mod=0 lck=0 [mouse_move]inputs-3:0: pos=308,139 delta=0,0 btn=0 dpy=0 cnt=2 mod=0x0 lck=0x0
This is also a good way to compare how "chatty" the old glib-based log is relative to the shorter default format for spice_trace.
Self-documented
To get the list of the current traces, you can use the -t help or -t list option.
The current list is basically the result of looking at the various CHANNEL_DEBUG statements and splitting them into subcategories, plus a few additional instrumentations I had put in place over time but, until now, had to cherry pick every time I wanted it.
List of available traces
canvas: Canvas-related information
capabilities: Channel capabilities
channel: Channel debug (CHANNEL_DEBUG macro)
channel_connect: Channel connection / disconnection
channel_cursor: Channel cursor information
channel_notify: Channel notifications
channel_read: Reading data from channels
channel_wait: Channel waits
channel_write: Writing data to channels
clipboard: Clipboard-related information
debug: General debug (SPICE_DEBUG macro)
display: Display-related information
drop_stats: Show drop statistics every N input frame
init: Initialization
gl: OpenGL-related information
marshall_read: De-marshalling data for reading
marshall_write: Marshalling data for writing
migration: Migration events
mmtime: Multimedia time measurements
monitors: Monitor-related information
mouse: Mouse-related information, except moves
mouse_move: Individual mouse movements (chatty)
port: Channel port information
rect: Rectangles debug
smartcard: Smartcard data
sound: Sound playback
sound_data: Sound data hex dump
sound_mode: Sound mode hex dump
ssl: SSL verification
sasl: SASL verification
surface: Surface-related information
trace_function: Show function in trace output
trace_location: Show code location in trace output
trace_name: Show trace names in trace output
trace_set: Show which traces are being set
trace_stdlog: Send traces to glib logging facility
trace_stdout: Send traces to standard output
trace_timestamp: Show time stamp in trace output
usb: USB redirection
tls: TLS negociations
webdav: WebDAV (directory redirection)
webdav_data: WebDAV hex dump
wire_read: Hex dump of data read on the wire
wire_write: Hex dump of data read on the wire
Performance
One of the things that concerns me about the current infrastructure is that it's relatively costly. Basically, Spice logs everything, taking the time to vsprintf-format every single message. Then, after carefully spending the time to figure out the proper sequence of characters, we send it to glib, which just as carefully discards it in most cases.
The cost is not entirely negligible. Below is a small benchmark I did comparing the various solutions. All numbers in seconds for 1M iterations, showing a counter, best of 3 runs on a non-loaded machine.
| spice_trace() | spice_log() | |
| Disabled trace | 0.0021 | 1.26 |
| Trace to console | 2.11 | 8.07 |
| Grep, filtering 20 entries | 0.36 | 4.37 |
| Sending to file in /tmp | 0.31 (29M file) | 4.31 (79M file) |
The bottom line is that this approach beats the crap out of the existing code, by a factor of 500+ for non-used trace code (I also tried g_debug, it’s somewhere in the middle, closer to spice_log).

2016-2017 - Christophe de Dinechin