Thursday, June 8, 2017
Well, some days are not as good as others.
macOS Spice
Yet another round for the macOS-enablement patch series. This time, it was panned harder than the previous ones. I sense that this is getting on people's nerves. That will teach me to take into account every single comment.
I can understand that it's irritating for people to have to ack the same patch multiple times. But if one person acks, and other make comments that seem valid, I prefer to respin.
But what is annoying me here is that one reason for respinning several times was to add a configure option to make sure that, by default, alignment-widening casts with SPICE_ALIGNED_CAST are not causing additional cost for testing alignment. Fine. I added the --enable-alignment-checksconfigure option, which is like a huge hammer to swat a fly.
But now, if the comments are that there should be no runtime cost even with --enable-alignment-checks, I give up. I want the instrumentation to tell me where we do stupidly misaligned things. Maybe this is the force of habit with PA-RISC or Itanium or older ARMs or basically anything but x86, where things had to be aligned or you'd start to pay dearly, at least performance-wise.
Worse, the code in master keeps introducing misaligned casts. So each resping grows a tiny bit larger.
Tracing options
Clearly, the discussion on tracing options yesterday was not going the way I wanted. There was no discussion of the categories we want to have in the code, no discussion of how to document tracing options. It was rather a collection of suggestion on how to use tools that really did not fit the bill at all for me. Oh well.
So I spent most of the day working on cleaning up debug traces, so that I could build a more convincing case. I did most of the boring work of converting existing debug output messages, splitting them into categories that seemed to make sense to me.
In the end, I had a version of Spicy that allows me to do a number of new things that I want to do frequently to collect information.
Statistics
Why do we drop so many frames?
% spicy -t drop_stats=100 -h turbo -p 5901[drop_stats]id=0 #in-frames=100 #late=100 (100.00%, avg 1639.24 ms) #drop-on-playback=8 (8.00%) [drop_stats]id=0 #in-frames=200 #late=200 (100.00%, avg 1603.13 ms) #drop-on-playback=8 (4.00%) [drop_stats]id=0 #in-frames=300 #late=300 (100.00%, avg 1607.24 ms) #drop-on-playback=20 (6.67%) [drop_stats]id=0 #in-frames=400 #late=400 (100.00%, avg 1597.48 ms) #drop-on-playback=20 (5.00%) [drop_stats]id=0 #in-frames=500 #late=500 (100.00%, avg 1591.58 ms) #drop-on-playback=20 (4.00%) [drop_stats]id=0 #in-frames=600 #late=600 (100.00%, avg 1587.52 ms) #drop-on-playback=20 (3.33%) [drop_stats]id=0 #in-frames=700 #late=700 (100.00%, avg 1584.60 ms) #drop-on-playback=20 (2.86%)
This illustrates why the tweaks are an integral part of the mechanism, not just something entirely different that I shoehorn in.
When I lifted this instrumentation from destroy_display_stream, I noticed that the code there gives me a negative num_out_frames. It looks like the computation is invalid, the code is stale, or nobody really bothered checking it recently. So I changed it in the above listing.
Time-stamped logs
Follow up to previous question: does what is being reported make sense if I compare that to actual timing? (I’d say no…)
% spicy -t drop_stats=100:trace_timestamp -h turbo -p 5901583120.963409:[drop_stats]id=0 #in-frames=100 #late=100 (100.00%, avg 1643.21 ms) #drop-on-playback=10 (10.00%) 1.425560:[drop_stats]id=0 #in-frames=200 #late=200 (100.00%, avg 1605.57 ms) #drop-on-playback=10 (5.00%) 2.853841:[drop_stats]id=0 #in-frames=300 #late=300 (100.00%, avg 1592.95 ms) #drop-on-playback=10 (3.33%) 4.270006:[drop_stats]id=0 #in-frames=400 #late=400 (100.00%, avg 1587.17 ms) #drop-on-playback=10 (2.50%) 5.679986:[drop_stats]id=0 #in-frames=500 #late=500 (100.00%, avg 1583.45 ms) #drop-on-playback=10 (2.00%)
The fact that the first time is absolute is not a bug, it’s in case you need the time of day when this happened, although this could probably be shown in a more friendly way.
What I don’t understand is how we can have 1.6 seconds average delay when the program ran for only 1.42 second… So I guess something is wrong in the way we exchange the initial multimedia times. I added some traces for related to mmtime, but I did not exploit them yet. To be investigated.
Wire-level info
What is actually received during the early protocol exchange? That’s mostly for my curiosity, to see how chatty the protocol is, the size of packets it tends to exchange, etc. This example and the next are tools that would immediately have shown me where the macOS version of spicy was goofing up talking to the server.
% spicy -t wire_read -h turbo -p 5901[wire_read]Read 16 bytes in 0x7f8b150d85a8 [wire_read]00000000: 52 45 44 51 02 00 00 00 02 00 00 00 BA 00 00 00
[wire_read]Read 186 bytes in 0x7f8b141c6650 [wire_read]00000000: 00 00 00 00 30 81 9F 30 0D 06 09 2A 86 48 86 F7 [wire_read]00000010: 0D 01 01 01 05 00 03 81 8D 00 30 81 89 02 81 81 [wire_read]00000020: 00 D2 A9 FE 7B F0 36 B7 33 3C C1 F0 5E 10 50 F3 [wire_read]00000030: 11 D3 3E 31 4A 95 AC AC 7B 0C 20 CB 57 B2 C8 1D [wire_read]00000040: 10 F1 E2 4A 1A F2 8F 2A 03 70 25 BB 63 C7 D6 51 [wire_read]00000050: D4 DA 64 3C 95 0C B6 9E A7 4E B3 3F 65 12 83 86 [wire_read]00000060: F4 35 F3 AC 3D C3 9A 6C C9 EE 34 D9 67 F0 8A 2D [wire_read]00000070: 53 4D 73 46 06 CA 80 04 CF 7F 72 4A 8D 74 81 78 [wire_read]00000080: E8 28 39 51 7C 9E 3B C2 19 D6 BD EC B4 F7 0A 68 [wire_read]00000090: D3 AF D7 1F E6 93 E1 4C 23 F5 83 68 E1 7D 31 9A [wire_read]000000A0: 53 02 03 01 00 01 01 00 00 00 01 00 00 00 B2 00 [wire_read]000000B0: 00 00 0B 00 00 00 09 00 00 00 [wire_read]Read 4 bytes in 0x109478b44 [wire_read]00000000: 00 00 00 00 ** Message: main channel: opened [wire_read]Read 6 bytes in 0x7f8b1274efb0 [wire_read]00000000: 67 00 20 00 00 00 [wire_read]Read 32 bytes in 0x7f8b1274dff0 [wire_read]00000000: 31 AE 1E 5E 01 00 00 00 03 00 00 00 02 00 00 00 [wire_read]00000010: 01 00 00 00 0A 00 00 00 9B AF BE 0B 00 00 00 01
Symbolic dumps
Hex dumps look nice, but practically limited. A symbolic view of the initial exchange (not all cases are covered in marshal.py / demarshal.py yet, so some fields are likely missing).
% spicy -t marshall_read -h turbo -p 5901[marshall_read] SpiceMsgMainInit.session_id=1745897490 [marshall_read] SpiceMsgMainInit.display_channels_hint=1 [marshall_read] SpiceMsgMainInit.supported_mouse_modes=3 [marshall_read] SpiceMsgMainInit.current_mouse_mode=2 [marshall_read] SpiceMsgMainInit.agent_connected=1 [marshall_read] SpiceMsgMainInit.agent_tokens=10 [marshall_read] SpiceMsgMainInit.multi_media_time=226863047 [marshall_read] SpiceMsgMainInit.ram_hint=16777216 [marshall_read] SpiceMsgMainName.name_len=10 [marshall_read] SpiceMsgPing.id=1 [marshall_read] SpiceMsgPing.timestamp=226863447500 [marshall_read] SpiceMsgPing.id=2 [marshall_read] SpiceMsgPing.timestamp=226863447502 [marshall_read] SpiceMsgPing.id=3 [marshall_read] SpiceMsgPing.timestamp=226863447503 [marshall_read] SpiceMsgChannels.num_of_channels=8 [marshall_read] SpiceChannelId.type=2 [marshall_read] SpiceChannelId.id=0 [marshall_read] SpiceChannelId.type=4 [marshall_read] SpiceChannelId.id=0 [marshall_read] SpiceChannelId.type=9 [marshall_read] SpiceChannelId.id=1 [marshall_read] SpiceChannelId.type=9 [marshall_read] SpiceChannelId.id=0 [marshall_read] SpiceChannelId.type=6 [marshall_read] SpiceChannelId.id=0 [marshall_read] SpiceChannelId.type=5 [marshall_read] SpiceChannelId.id=0 [marshall_read] SpiceChannelId.type=2 [marshall_read] SpiceChannelId.id=1 [marshall_read] SpiceChannelId.type=3 [marshall_read] SpiceChannelId.id=0
I believe this example illustrates well the kind of instrumentation that is useful and convenient to have there all the time, but that we don’t want to send to a debug log if not explicitly requested. I see no reason to pollute even a structured log with that stuff, unless I actually need it. One line per field with a common structure on purpose, to make it easy to grep / awk the data you want.
Real-time info
The new tracing options let me show things that are to chatty for normal use, like mouse moves. I think that something similar to this may be useful in understanding the recent caps-lock bug.
% spicy -t mouse_move:trace_timestamp -h turbo -p 59010.000035:[mouse_move]inputs-3:0: pos=505,299 delta=0,0 btn=0 dpy=0 cnt=0 mod=0x0 lck=0x0 0.024286:[mouse_move]inputs-3:0: pos=482,257 delta=0,0 btn=0 dpy=0 cnt=1 mod=0 lck=0 0.024299:[mouse_move]inputs-3:0: pos=482,257 delta=0,0 btn=0 dpy=0 cnt=1 mod=0x0 lck=0x0 0.032647:[mouse_move]inputs-3:0: pos=439,182 delta=0,0 btn=0 dpy=0 cnt=2 mod=0 lck=0 0.032658:[mouse_move]inputs-3:0: pos=439,182 delta=0,0 btn=0 dpy=0 cnt=2 mod=0x0 lck=0x0 0.068790:[mouse_move]inputs-3:0: pos=401,129 delta=0,0 btn=0 dpy=0 cnt=3 mod=0 lck=0 0.068804:[mouse_move]inputs-3:0: pos=401,129 delta=0,0 btn=0 dpy=0 cnt=3 mod=0x0 lck=0x0 0.097703:[mouse_move]inputs-3:0: pos=392,115 delta=0,0 btn=0 dpy=0 cnt=0 mod=0 lck=0 0.097716:[mouse_move]inputs-3:0: pos=392,115 delta=0,0 btn=0 dpy=0 cnt=0 mod=0x0 lck=0x0 0.118830:[mouse_move]inputs-3:0: pos=379,95 delta=0,0 btn=0 dpy=0 cnt=1 mod=0 lck=0 0.118842:[mouse_move]inputs-3:0: pos=379,95 delta=0,0 btn=0 dpy=0 cnt=1 mod=0x0
Compatibility
There are tracing options to use the standard glib log in addition to stdout. Trace name are visible as Spice[x] domains).
Note that after exploring the glib logging code, I am relatively concerned about the performance if there are very large numbers of domains, since they seem to be doing linear searches to find the domain each time you log something.
Back to the tracing options, you can of course set trace_stdout=0 if you only want the glib log. Permanently in SPICE_TRACES in your .bashrc if that’s your preference.
% G_MESSAGES_DEBUG=all spicy -t mouse_move:trace_stdlog -h turbo -p 5901[mouse_move]inputs-3:0: pos=330,166 delta=0,0 btn=0 dpy=0 cnt=0 mod=0 lck=0 (spicy:57464): Spice[mouse_move]-DEBUG: channel-inputs.c:346:spice_inputs_position: inputs-3:0: pos=330,166 delta=0,0 btn=0 dpy=0 cnt=0 mod=0 lck=0 [mouse_move]inputs-3:0: pos=330,166 delta=0,0 btn=0 dpy=0 cnt=0 mod=0x0 lck=0x0 (spicy:57464): Spice[mouse_move]-DEBUG: channel-inputs.c:188:mouse_position: inputs-3:0: pos=330,166 delta=0,0 btn=0 dpy=0 cnt=0 mod=0x0 lck=0x0 [mouse_move]inputs-3:0: pos=322,155 delta=0,0 btn=0 dpy=0 cnt=1 mod=0 lck=0 (spicy:57464): Spice[mouse_move]-DEBUG: channel-inputs.c:346:spice_inputs_position: inputs-3:0: pos=322,155 delta=0,0 btn=0 dpy=0 cnt=1 mod=0 lck=0 [mouse_move]inputs-3:0: pos=322,155 delta=0,0 btn=0 dpy=0 cnt=1 mod=0x0 lck=0x0 (spicy:57464): Spice[mouse_move]-DEBUG: channel-inputs.c:188:mouse_position: inputs-3:0: pos=322,155 delta=0,0 btn=0 dpy=0 cnt=1 mod=0x0 lck=0x0 [mouse_move]inputs-3:0: pos=308,139 delta=0,0 btn=0 dpy=0 cnt=2 mod=0 lck=0 (spicy:57464): Spice[mouse_move]-DEBUG: channel-inputs.c:346:spice_inputs_position: inputs-3:0: pos=308,139 delta=0,0 btn=0 dpy=0 cnt=2 mod=0 lck=0 [mouse_move]inputs-3:0: pos=308,139 delta=0,0 btn=0 dpy=0 cnt=2 mod=0x0 lck=0x0
This is also a good way to compare how "chatty" the old glib-based log is relative to the shorter default format for spice_trace.
Self-documented
To get the list of the current traces, you can use the -t help or -t list option.
The current list is basically the result of looking at the various CHANNEL_DEBUG statements and splitting them into subcategories, plus a few additional instrumentations I had put in place over time but, until now, had to cherry pick every time I wanted it.
List of available traces canvas: Canvas-related information capabilities: Channel capabilities channel: Channel debug (CHANNEL_DEBUG macro) channel_connect: Channel connection / disconnection channel_cursor: Channel cursor information channel_notify: Channel notifications channel_read: Reading data from channels channel_wait: Channel waits channel_write: Writing data to channels clipboard: Clipboard-related information debug: General debug (SPICE_DEBUG macro) display: Display-related information drop_stats: Show drop statistics every N input frame init: Initialization gl: OpenGL-related information marshall_read: De-marshalling data for reading marshall_write: Marshalling data for writing migration: Migration events mmtime: Multimedia time measurements monitors: Monitor-related information mouse: Mouse-related information, except moves mouse_move: Individual mouse movements (chatty) port: Channel port information rect: Rectangles debug smartcard: Smartcard data sound: Sound playback sound_data: Sound data hex dump sound_mode: Sound mode hex dump ssl: SSL verification sasl: SASL verification surface: Surface-related information trace_function: Show function in trace output trace_location: Show code location in trace output trace_name: Show trace names in trace output trace_set: Show which traces are being set trace_stdlog: Send traces to glib logging facility trace_stdout: Send traces to standard output trace_timestamp: Show time stamp in trace output usb: USB redirection tls: TLS negociations webdav: WebDAV (directory redirection) webdav_data: WebDAV hex dump wire_read: Hex dump of data read on the wire wire_write: Hex dump of data read on the wire
Performance
One of the things that concerns me about the current infrastructure is that it's relatively costly. Basically, Spice logs everything, taking the time to vsprintf-format every single message. Then, after carefully spending the time to figure out the proper sequence of characters, we send it to glib, which just as carefully discards it in most cases.
The cost is not entirely negligible. Below is a small benchmark I did comparing the various solutions. All numbers in seconds for 1M iterations, showing a counter, best of 3 runs on a non-loaded machine.
spice_trace() | spice_log() | |
Disabled trace | 0.0021 | 1.26 |
Trace to console | 2.11 | 8.07 |
Grep, filtering 20 entries | 0.36 | 4.37 |
Sending to file in /tmp | 0.31 (29M file) | 4.31 (79M file) |
The bottom line is that this approach beats the crap out of the existing code, by a factor of 500+ for non-used trace code (I also tried g_debug, it’s somewhere in the middle, closer to spice_log).