Demonstrating client-side SPICE overload

Friday, April 20, 2018

SPICE client overload - Queue length issue

The decoder queue length in the client is not bounded. As a result, the client can end up having an arbitrary decoding queue length, which introduces arbitrarily long delays. Experimentally, this happens as soon as the client-side machine is under sufficiently high load.

Methodology

All measurements below were taken on a recorder branch that was rebased on April 17recorder.

The command to collect the data was the following (the SPICE_TRACES environment variable is used to export the gst_queue_stats recorder first two columns under the names frame_size and queue_length):

SPICE_TRACES='gst_queue_stats=frame_size,queue_length' ./spicy -h turbo -p 5900

The command to display the data in the frame_size and queue_length columns is (search for recorder_scope in the recorder documentation for details):

recorder_scope frame_size queue_length

The gst_queue_stats data is collected as follows (see the source code in context):

    RECORD(gst_queue_stats, "Frame size %lu length %lu", frame->size, decoder->decoding_queue->length);

Measurements

Below is a queue length measurement taken under light load. In that case, the queue length remains very small and the system remains stable over time. Typical observed queue length are single digit in that case, lower with hardware-accelerated decoding.

Below is a queue length measurement taken under transient (pulse) load. In that case, there is a temporary increase in queue length, which remains moderate. I suspect transient loads explain the data Uri shared.

Below is a queue length measurement taken under heavy external load (in that case, a few spinners). Under this load, the decoding queue length diverges.

Below is a queue length measurement after a relatively short period of time (a couple of minutes). It shows that the queue length can grow to arbitrarily large values, here reaching nearly 2000 (which is about one minute delay at 30FPS).

The phenomenon is entirely reversible. Here is the same measurement once the external load is removed:

The system ultimately returns to normal:

Conclusion

An external load on the machine running the SPICE client (i.e. anything that eats CPU) can cause the SPICE client to accumulate an arbitrary number of frames in its decoding queue, which results in arbirary long latency.