Memory leak analysis

Thursday, October 12, 2017

With the git-built version of gstreamer, reproducing does not seem to be as easy as before. But I finally got it after several attempts. In that case, it was a sudden jump of 3.4G!!! That happened after leaving spicy running overnight (5.2e4s is roughly 14 hours).

Not sure it is that easy to get the backtrace from the nice GUI. It looks like this:

The stack trace for the primary source of leak (3.3G) imageis:

g_malloc in ?? (libglib-2.0.so.0)
g_slice_alloc in ?? (libglib-2.0.so.0)
_sysmem_new_block in gstallocator.c:417 (libgstreamer-1.0.so.0)
default_alloc in gstallocator.c:516 (libgstreamer-1.0.so.0)
gst_allocator_alloc in gstallocator.c:314 (libgstreamer-1.0.so.0)
gst_buffer_new_allocate in gstbuffer.c:839 (libgstreamer-1.0.so.0)
video_buffer_pool_alloc in gstvideopool.c:254 (libgstvideo-1.0.so.0)
do_alloc_buffer in gstbufferpool.c:278 (libgstreamer-1.0.so.0)
default_acquire_buffer in gstbufferpool.c:1134 (libgstreamer-1.0.so.0)
gst_buffer_pool_acquire_buffer in gstbufferpool.c:1265 (libgstreamer-1.0.so.0)
default_prepare_output_buffer in gstbasetransform.c:1597 (libgstbase-1.0.so.0)
default_generate_output in gstbasetransform.c:2094 (libgstbase-1.0.so.0)
gst_base_transform_chain in gstbasetransform.c:2276 (libgstbase-1.0.so.0)
gst_pad_chain_data_unchecked in gstpad.c:4215 (libgstreamer-1.0.so.0)
gst_pad_push_data in gstpad.c:4471 (libgstreamer-1.0.so.0)
gst_pad_push in gstpad.c:4590 (libgstreamer-1.0.so.0)
gst_base_transform_chain in gstbasetransform.c:2312 (libgstbase-1.0.so.0)
gst_pad_chain_data_unchecked in gstpad.c:4215 (libgstreamer-1.0.so.0)
gst_pad_push_data in gstpad.c:4471 (libgstreamer-1.0.so.0)
gst_pad_push in gstpad.c:4590 (libgstreamer-1.0.so.0)
gst_base_transform_chain in gstbasetransform.c:2312 (libgstbase-1.0.so.0)
gst_pad_chain_data_unchecked in gstpad.c:4215 (libgstreamer-1.0.so.0)
gst_pad_push_data in gstpad.c:4471 (libgstreamer-1.0.so.0)
gst_pad_push in gstpad.c:4590 (libgstreamer-1.0.so.0)
gst_base_transform_chain in gstbasetransform.c:2312 (libgstbase-1.0.so.0)
gst_pad_chain_data_unchecked in gstpad.c:4215 (libgstreamer-1.0.so.0)
gst_pad_push_data in gstpad.c:4471 (libgstreamer-1.0.so.0)
gst_pad_push in gstpad.c:4590 (libgstreamer-1.0.so.0)
gst_base_transform_chain in gstbasetransform.c:2312 (libgstbase-1.0.so.0)
gst_pad_chain_data_unchecked in gstpad.c:4215 (libgstreamer-1.0.so.0)
gst_pad_push_data in gstpad.c:4471 (libgstreamer-1.0.so.0)
gst_pad_push in gstpad.c:4590 (libgstreamer-1.0.so.0)
gst_proxy_pad_chain_default in gstghostpad.c:127 (libgstreamer-1.0.so.0)
gst_pad_chain_data_unchecked in gstpad.c:4215 (libgstreamer-1.0.so.0)
gst_pad_push_data in gstpad.c:4471 (libgstreamer-1.0.so.0)
gst_pad_push in gstpad.c:4590 (libgstreamer-1.0.so.0)
gst_queue_push_one in gstqueue.c:1383 (libgstcoreelements.so)
gst_queue_loop in gstqueue.c:1536 (libgstcoreelements.so)
gst_task_func in gsttask.c:332 (libgstreamer-1.0.so.0)
default_func in gsttaskpool.c:69 (libgstreamer-1.0.so.0)
 in ?? (libglib-2.0.so.0)
 in ?? (libglib-2.0.so.0)
start_thread in ?? (libpthread.so.0)
__GI___clone in ?? (libc.so.6)

This had to be copy-pasted one line at a time, heaptrack lacks a nice feature here.

It's somewhat amazing that in that time period, the program had allocated about 9TB of data, in a somewhat continuous stream.

Having a stack trace, I filed Red Hat Bugzilla 1501364.

It's not always necessary to wait hours, though. Here is a case I triggered by using the Control-Z / fg trick several times :

What is new about this one is that it's more progressive. After 40s, there is this somewhat continuous increase.

Tried with 300 spinners, no success in reproducing, but as soon as I hit control-Z and fg, I had the problem again. This time I was running with GST_DEBUG=5. And I had yet another kind of memory leak (but was that because of GST_DEBUG or the spinners or random chance?):

It's not always the case that "Control-Z" works though. For example, on the last run, I had to do the "Control-Z" trick six times before seeing memory jump to a modest 167M, and then another few times before finally seeing a jump to 1.5G.

What is interesting is that this time I also kept the log file for GST_DEBUG=5 output in a file instead of the terminal buffer. That's 533MB of highly repeatitive data, helpfully formatted using ANSI color codes. Thankfully I have this nice Emacs macro to color a buffer, ansi-color-region.

Did find a few occurences of "leak", but nothing interesting:

0:00:00.242203176 28143      0x1c1ea60 DEBUG           GST_REGISTRY gstregistrychunks.c:572:gst_registry_chunks_load_feature: Plugin 'coretracers' feature 'leaks' typename : 'GstTracerFactory'
0:00:00.242241437 28143      0x1c1ea60 DEBUG        GST_REFCOUNTING gstobject.c:672:gst_object_set_parent: set parent (ref and sink)

Can't zoom in heaptrack, so it's a bit hard to know the precise timing when something happens and match that with the very verbose GST_DEBUG=5 output (which, at least, has timing). But there is a somewhat hard to trigger popover window that gives me timing. It looks like the first set of interesting things happened between 50s and 56s.

Interestingly, there are a number of mentions of dts in the log with a duration around 50s:

0:00:20.939661002 28143      0x2450f20 DEBUG         GST_SCHEDULING gstpad.c:4213:gst_pad_chain_data_unchecked: calling chainfunction &gst_base_transform_chain with buffer buffer: 0x2bf3360, pts 0:00:17.986224045, dts 0:00:50.400000000, dur 0:00:00.050000000, size 11717, offset 49749081, offset_end none, flags 0x2000

Problem is that I cannot upload that to Red Hat Bugzilla 1501364, it's too big.

So apparently, these memory allocations are scheduled for later deletion.

0:00:50.002419484 28143      0x2450f20 DEBUG         GST_SCHEDULING gstpad.c:4219:gst_pad_chain_data_unchecked: called chainfunction &gst_base_parse_chain with buffer 0x3064920, returned ok

0:00:50.002436420 28143 0x2450f20 DEBUG GST_SCHEDULING gstpad.c:4219:gst_pad_chain_data_unchecked: called chainfunction &gst_type_find_element_chain with buffer 0x3064920, returned ok 0:00:50.002451132 28143 0x2450f20 DEBUG GST_SCHEDULING gstpad.c:4219:gst_pad

Asked on IRC #gstreamer about how to debug this kind of issue. One suggestion was to use the GStreamer leak tracer

So I ran:

GST_TRACERS="leaks" GST_DEBUG="GST_TRACER:7" heaptrack spicy -h turbo -p 5901  2>&1  | tee /tmp/debug2.log

And got:

heaptrack output will be written to "/home/ddd/Work/spice/spice-gtk/heaptrack.spicy.29176.gz"

starting application, this might take some time... 0:00:00.044581355 29193 0x1bf9a60 DEBUG GST_TRACER gsttracer.c:164:gst_tracer_register: update existing feature 0x2108990 (latency) 0:00:00.044758797 29193 0x1bf9a60 DEBUG GST_TRACER gsttracer.c:164:gst_tracer_register: update existing feature 0x2108a50 (log) 0:00:00.044908236 29193 0x1bf9a60 DEBUG GST_TRACER gsttracer.c:164:gst_tracer_register: update existing feature 0x2108b10 (rusage) 0:00:00.045036359 29193 0x1bf9a60 DEBUG GST_TRACER gsttracer.c:164:gst_tracer_register: update existing feature 0x2108bd0 (stats) 0:00:00.045134963 29193 0x1bf9a60 DEBUG GST_TRACER gsttracer.c:164:gst_tracer_register: update existing feature 0x2108c90 (leaks) 0:00:00.045320503 29193 0x1bf9a60 TRACE GST_TRACER gsttracerrecord.c:111:gst_tracer_record_build_format: object-alive.class, type-name=(structure)"value, type=(type)gchararray, related-to=(GstTracerValueScope)GST_TRACER_VALUE_SCOPE_PROCESS;", address=(structure)"value, type=(type)gpointer, related-to=(GstTracerValueScope)GST_TRACER_VALUE_SCOPE_PROCESS;", description=(structure)"value, type=(type)gchararray, related-to=(GstTracerValueScope)GST_TRACER_VALUE_SCOPE_PROCESS;", ref-count=(structure)"value, type=(type)guint, related-to=(GstTracerValueScope)GST_TRACER_VALUE_SCOPE_PROCESS;", trace=(structure)"value, type=(type)gchararray, related-to=(GstTracerValueScope)GST_TRACER_VALUE_SCOPE_PROCESS;"; 0:00:00.045554930 29193 0x1bf9a60 DEBUG GST_TRACER gsttracerrecord.c:125:gst_tracer_record_build_format: new format string: object-alive, type-name=(string)%s, address=(gpointer)%p, description=(string)%s, ref-count=(uint)%u, trace=(string)%s; 0:00:00.045669918 29193 0x1bf9a60 TRACE GST_TRACER gsttracerrecord.c:111:gst_tracer_record_build_format: object-refings.class, ts=(structure)"value, type=(type)guint64, related-to=(GstTracerValueScope)GST_TRACER_VALUE_SCOPE_PROCESS;", type-name=(structure)"value, type=(type)gchararray, related-to=(GstTracerValueScope)GST_TRACER_VALUE_SCOPE_PROCESS;", address=(structure)"value, type=(type)gpointer, related-to=(GstTracerValueScope)GST_TRACER_VALUE_SCOPE_PROCESS;", description=(structure)"value, type=(type)gchararray, related-to=(GstTracerValueScope)GST_TRACER_VALUE_SCOPE_PROCESS;", ref-count=(structure)"value, type=(type)guint, related-to=(GstTracerValueScope)GST_TRACER_VALUE_SCOPE_PROCESS;", trace=(structure)"value, type=(type)gchararray, related-to=(GstTracerValueScope)GST_TRACER_VALUE_SCOPE_PROCESS;"; 0:00:00.045771484 29193 0x1bf9a60 DEBUG GST_TRACER gsttracerrecord.c:125:gst_tracer_record_build_format: new format string: object-refings, ts=(guint64)%lu, type-name=(string)%s, address=(gpointer)%p, description=(string)%s, ref-count=(uint)%u, trace=(string)%s; ** Message: main channel: opened 0:00:00.945577668 29193 0x24276d0 ERROR h264parse gsth264parse.c:2559:gst_h264_parse_set_caps: video/x-h264 caps without codec_data or stream-format 0:00:00.946691805 29193 0x24276d0 ERROR h264parse gsth264parse .c:2559:gst_h264_parse_set_caps: video/x-h264 caps without codec_data or stream-format heaptrack stats: allocations: 754291 leaked allocations: 58206 temporary allocations: 78548 Heaptrack finished! Now run the following to investigate the data:

heaptrack_gui "/home/ddd/Work/spice/spice-gtk/heaptrack.spicy.29176.gz"

So the tracer did not see much, but there was a leak:

On IRC, cassidy remarked I need to make sure gst_deinit is called at program exit. It's not in our code. Adding an atexit() call in gstvideo_init(). With that, I get:

GST_TRACERS="leaks" GST_DEBUG="GST_TRACER:7" heaptrack spicy -h turbo -p 5901 2>&1 | tee /tmp/debug3.log

0:01:45.658967097 32656 0x179ba60 TRACE GST_TRACER :0:: object-alive, type-name=(string)GstBus, address=(gpointer)0x1fe3070, description=(string), ref-count=(uint)1, trace=(string);

** (spicy:32656): WARNING **: Leaks detected heaptrack stats: allocations: 814945 leaked allocations: 33958 temporary allocations: 79172 Heaptrack finished! Now run the following to investigate the data:

heaptrack_gui "/home/ddd/Work/spice/spice-gtk/heaptrack.spicy.32639.gz"

I was expecting a little more verbosity

Also, there was a suggestion I try to reproduce this using only playbin. So far, I have not been very successful on that front. Running gst-launch-1.0 playbin uri=file:///somefile.mp4 shows no obvious leak.

Reproduced in Wayland.

Rate of growth when the problem hits is quite high: