Discussion:
[Xcb] [Bug 93414] New: hang in xcb_wait_for_reply / vl_dri2_get_flush_reply
b***@freedesktop.org
2015-12-16 21:33:38 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93414

Bug ID: 93414
Summary: hang in xcb_wait_for_reply / vl_dri2_get_flush_reply
Product: XCB
Version: 1.11
Hardware: Other
OS: All
Status: NEW
Severity: normal
Priority: medium
Component: Protocol
Assignee: ***@lists.freedesktop.org
Reporter: ***@gmail.com
QA Contact: ***@lists.freedesktop.org

Application that renders using libva/vdpau/opengl hangs after a few minutes of
operation (always reproducible within 1-15 minutes usually).

I see a hang in xcb_wait_for_reply() for request #31161:

Possibly relevant part of xtrace:
000:>:4d8a:32: Reply to SwapBuffers: swap_hi=0 swap_lo=31161
000:>:4d8b:32: Reply to GetInputFocus: revert-to=Parent(0x02) focus=0x02e000e0
000:<:24d8c: 20: DRI2-Request(155,7): GetBuffersWithFormat drawable=0x04400006
attachments={attachment=BackLeft(0x00000001) format=0x00000020};
000:>:24d8c: Event DRI2-BufferSwapComplete(102) drawable=0x00000002
ust_hi=71303174 ust_lo=0 msc_hi=1858207713 msc_lo=0 sbc_hi=109393 sbc_lo=31161

And:
000:<:31161: 4: Request(43): GetInputFocus
000:>:31161: Event DRI2-InvalidateBuffers(103) drawable=0x04400006

Backtrace of thread waiting in xcb_wait_for_reply():

Thread 8 (Thread 0x7fca1f778700 (LWP 3580)):
#0 0x00007fca55a3e12d in poll () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007fca5316b262 in poll (__timeout=-1, __nfds=1, __fds=0x7fca1f776a20)
at
/usr/include/x86_64-linux-gnu/bits/poll2.h:46
#2 _xcb_conn_wait (c=***@entry=0x2885500, cond=***@entry=0x7fca1f776b40,
vector=***@entry=0x0, count=***@entry=0x0) at xcb_conn.c:459
#3 0x00007fca5316cadf in wait_for_reply (c=***@entry=0x2885500, request=31161,
e=***@entry=0x0) at xcb_in.c:491
#4 0x00007fca5316cbf2 in xcb_wait_for_reply (c=0x2885500, request=31161,
e=0x0)
at xcb_in.c:521
#5 0x00007fca419b8004 in vl_dri2_get_flush_reply (scrn=0x2653600) at
../../../../../src/gallium/auxiliary/vl/vl_winsys_dri.c:110
#6 0x00007fca419b8101 in vl_dri2_destroy_drawable (scrn=0x2653600) at
../../../../../src/gallium/auxiliary/vl/vl_winsys_dri.c:153
#7 0x00007fca419b8185 in vl_dri2_set_drawable (scrn=0x2653600,
drawable=71303307) at
../../../../../src/gallium/auxiliary/vl/vl_winsys_dri.c:176
#8 0x00007fca419b82c8 in vl_dri2_screen_texture_from_drawable
(vscreen=0x2653600, drawable=0x440008b) at
../../../../../src/gallium/auxiliary/vl/vl_winsys_dri.c:193
#9 0x00007fca419b61ab in vlVdpPresentationQueueDisplay
(presentation_queue=<optimized out>, surface=5, clip_width=1280,
clip_height=720, earliest_presentation_time=0)
at
../../../../../../src/gallium/state_trackers/vdpau/presentation.c:234
#10 0x00007fca423ca6a3 in flip_surface_unlocked
(driver_data=***@entry=0x2650c90,
obj_output=***@entry=0x291a2b0) at vdpau_video_x11.c:583
#11 0x00007fca423cb2e1 in queue_surface_unlocked (obj_surface=<optimized out>,
obj_output=0x291a2b0, driver_data=0x2650c90) at vdpau_video_x11.c:610
#12 put_surface_unlocked (flags=32, target_rect=0x7fca1f776f90,
source_rect=0x7fca1f776f80, obj_output=0x291a2b0,
obj_surface=0x2919bf0,
driver_data=0x2650c90) at vdpau_video_x11.c:689
#13 put_surface (driver_data=***@entry=0x2650c90,
surface=***@entry=50331655, drawable=***@entry=71303307,
drawable_width=<optimized out>, drawable_height=720,
source_rect=***@entry=0x7fca1f776f80,
target_rect=***@entry=0x7fca1f776f90, flags=***@entry=32)
at vdpau_video_x11.c:755
#14 0x00007fca423cb59c in vdpau_PutSurface (ctx=<optimized out>,
surface=50331655, draw=0x440008b, srcx=<optimized out>, srcy=<optimized
out>, srcw=<optimized out>, srch=720, destx=0, desty=0,
destw=1280, desth=720, cliprects=0x0, number_cliprects=0, flags=32)
at vdpau_video_x11.c:808
#15 0x000000000040eed2 in on_client_draw (glsink=<optimized out>,
buf=<optimized
out>, data=0x626d40 <video>) at video.c:619


Full backtrace at: http://pastebin.com/rQCJm64C

Using:

libX11 1.6.2
libxcb 1.11 or 1.10
mesa 11.0 (11.0~git1508070730.42d283~gd~t from oibaf/graphics-drivers ppa)
xserver-xorg-video-radeon 1:7.5.99+git1508070731.3791fc~gd~t (same ppa)


I also made same bug under dri/gallium/radeon_si
https://bugs.freedesktop.org/show_bug.cgi?id=93412 , apologies for this, I am
not sure where to file this problem. I will close one of the two bugs whenI am
pointed to the most appropriate component of the two.
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
b***@freedesktop.org
2015-12-17 06:49:26 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93414

Michel DÀnzer <***@daenzer.net> changed:

What |Removed |Added
----------------------------------------------------------------------------
Component|Protocol |Library

--- Comment #1 from Michel DÀnzer <***@daenzer.net> ---
First of all, does the app call XInitThreads before any other libX11 functions
are called (on any thread)?

If so, any chance you can try libxcb 1.11.1 or a current Git snapshot?
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
b***@freedesktop.org
2015-12-17 06:50:59 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93414

Michel DÀnzer <***@daenzer.net> changed:

What |Removed |Added
----------------------------------------------------------------------------
See Also| |https://bugs.freedesktop.or
| |g/show_bug.cgi?id=93412
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
b***@freedesktop.org
2015-12-17 14:31:09 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93414

--- Comment #2 from Vasilis LIaskovitis <***@gmail.com> ---
Yes, XInitThreads() is called at startuo (main function) before any other
libX11 functions are called from any threads of the app.

Problem still exists with libxcb 1.11.1 (I am buiding and installing it
manually though. Does newer Xlib also need to be built, or is xcb sufficient?)
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
b***@freedesktop.org
2015-12-19 10:45:52 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93414

--- Comment #3 from Uli Schlachter <***@znc.in> ---
(In reply to Vasilis LIaskovitis from comment #0)
[...]
Post by b***@freedesktop.org
000:>:4d8a:32: Reply to SwapBuffers: swap_hi=0 swap_lo=31161
000:>:4d8b:32: Reply to GetInputFocus: revert-to=Parent(0x02)
focus=0x02e000e0
000:<:24d8c: 20: DRI2-Request(155,7): GetBuffersWithFormat
drawable=0x04400006 attachments={attachment=BackLeft(0x00000001)
format=0x00000020};
000:>:24d8c: Event DRI2-BufferSwapComplete(102) drawable=0x00000002
ust_hi=71303174 ust_lo=0 msc_hi=1858207713 msc_lo=0 sbc_hi=109393 sbc_lo=31161
Why is this part relevant? It's 0x31161-0x24d8c = 50133 requests before the
hang, so apparently things still work for 50k requests. That should be a long
time.
Post by b***@freedesktop.org
000:<:31161: 4: Request(43): GetInputFocus
000:>:31161: Event DRI2-InvalidateBuffers(103) drawable=0x04400006
So if I understand this correctly, you are saying that requests #31161 is a
GetInputFocus request and xtrace never saw a reply to it, right?
Isn't the bug then in the server for not replying to GetInputFocus?
Post by b***@freedesktop.org
Full backtrace at: http://pastebin.com/rQCJm64C
Thread 16 is in radeon_something, waits on a condition
Thread 15 waits for the vaapi lock(?) (pthread_mutex_lock() from within
gst_vaapisink_x11_handle_events())
Thread 14 waits on a condition inside gst_data_queue_push() (hm?!? Is some
queue full, or why does it wait?)
Thread 13 waits on a condition inside gst_task_func (dunno what this is)
Thread 12 waits on a condition inside gst_vaapidecode_handle_frame() (and has a
deep backtrace with lots of gst stuff)
Thread 11 waits in gst_queue_chain_buffer_or_list() (deep backtrace)
Thread 10 waits on a condition inside gst_queue_loop()
Thread 9 waits on a condition inside gst_base_sink_wait_preroll() (deep
backtrace)
Thread 8 poll()s for a reply to request 31161 (called from
vl_dri2_get_flush_reply()) (deep backtrace)
Thread 7 waits on a condition inside ??() in libavcodec.so
So do thread 6, 5, 4, 3.
Thread 2 poll()s something pulseaudio related (pa_mainloop_poll())
Thread 1 *also* poll()s the xcb connection (wait_for_reply() on request 212481,
called from _XReply(), called from DRI2GetBuffersWithFormat, called eventually
from st_Clear() (something from mesa), called from main())

Ohhh... thread 1 and thread 8 wait on different XCB connections. I have no clue
about the internals of mesa etc, but could it be that something did a
GrabServer and that's why the GetInputFocus (and DRI2GetBuffersWithFormat)
isn't replied to? Since each thread involving libxcb has its own connection, I
really don't think that the deadlock is inside of libxcb.

@Vasilis: How do you know that this is a deadlock? Does your X11 server still
work correctly otherwise? I'd expect both of these pending requests to get
replies.

Also, could you do something like this ('thread 1' would switch to, well,
thread 1 and frame 2 to the frame with `_xcb_conn_wait()` so that we get a
reference to the XCB connection, the various prints print the state of the XCB
connection):
thread 1
frame 2
print c
print c.in->current_reply
print c.in->events
print c.in->pending_replies
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
b***@freedesktop.org
2015-12-21 10:23:58 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93414

--- Comment #4 from Vasilis LIaskovitis <***@gmail.com> ---
(In reply to Uli Schlachter from comment #3)
Post by b***@freedesktop.org
(In reply to Vasilis LIaskovitis from comment #0)
[...]
Post by b***@freedesktop.org
000:>:4d8a:32: Reply to SwapBuffers: swap_hi=0 swap_lo=31161
000:>:4d8b:32: Reply to GetInputFocus: revert-to=Parent(0x02) focus=0x02e000e0
000:<:24d8c: 20: DRI2-Request(155,7): GetBuffersWithFormat
drawable=0x04400006 attachments={attachment=BackLeft(0x00000001)
format=0x00000020};
000:>:24d8c: Event DRI2-BufferSwapComplete(102) drawable=0x00000002
ust_hi=71303174 ust_lo=0 msc_hi=1858207713 msc_lo=0 sbc_hi=109393 sbc_lo=31161
Why is this part relevant? It's 0x31161-0x24d8c = 50133 requests before the
hang, so apparently things still work for 50k requests. That should be a
long time.
You are right, this looks irrelevant.. not very familiar xith X protocol, I was
thinking sbc_lo could refer to a request ID.
Post by b***@freedesktop.org
Post by b***@freedesktop.org
000:<:31161: 4: Request(43): GetInputFocus
000:>:31161: Event DRI2-InvalidateBuffers(103) drawable=0x04400006
So if I understand this correctly, you are saying that requests #31161 is a
GetInputFocus request and xtrace never saw a reply to it, right?
Isn't the bug then in the server for not replying to GetInputFocus?
Post by b***@freedesktop.org
Full backtrace at: http://pastebin.com/rQCJm64C
Thread 16 is in radeon_something, waits on a condition
Thread 15 waits for the vaapi lock(?) (pthread_mutex_lock() from within
gst_vaapisink_x11_handle_events())
Thread 14 waits on a condition inside gst_data_queue_push() (hm?!? Is some
queue full, or why does it wait?)
Thread 13 waits on a condition inside gst_task_func (dunno what this is)
Thread 12 waits on a condition inside gst_vaapidecode_handle_frame() (and
has a deep backtrace with lots of gst stuff)
Thread 11 waits in gst_queue_chain_buffer_or_list() (deep backtrace)
Thread 10 waits on a condition inside gst_queue_loop()
Thread 9 waits on a condition inside gst_base_sink_wait_preroll() (deep
backtrace)
Thread 8 poll()s for a reply to request 31161 (called from
vl_dri2_get_flush_reply()) (deep backtrace)
Thread 7 waits on a condition inside ??() in libavcodec.so
So do thread 6, 5, 4, 3.
Thread 2 poll()s something pulseaudio related (pa_mainloop_poll())
Thread 1 *also* poll()s the xcb connection (wait_for_reply() on request
212481, called from _XReply(), called from DRI2GetBuffersWithFormat, called
eventually from st_Clear() (something from mesa), called from main())
Ohhh... thread 1 and thread 8 wait on different XCB connections. I have no
clue about the internals of mesa etc, but could it be that something did a
GrabServer and that's why the GetInputFocus (and DRI2GetBuffersWithFormat)
isn't replied to? Since each thread involving libxcb has its own connection,
I really don't think that the deadlock is inside of libxcb.
@Vasilis: How do you know that this is a deadlock? Does your X11 server
still work correctly otherwise? I'd expect both of these pending requests to
get replies.
Also, could you do something like this ('thread 1' would switch to, well,
thread 1 and frame 2 to the frame with `_xcb_conn_wait()` so that we get a
reference to the XCB connection, the various prints print the state of the
thread 1
frame 2
print c
print c.in->current_reply
print c.in->events
print c.in->pending_replies
pending_replies is 0

(gdb) print c.in->current_reply
$4 = (struct reply_list *) 0x0
(gdb) print c.in->events
$5 = (struct event_list *) 0x7f0b1c008730
(gdb) print c.in->pending_replies
$6 = (struct pending_reply *) 0x0

Looking at all threads in bt, this may be a completely different issue with
gstreamer queues.. we can maybe close the issue and reopen if I find something
that really proves it is an xcb/xserver issue (or if you have another idea)
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
b***@freedesktop.org
2015-12-21 10:34:22 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93414

--- Comment #5 from Uli Schlachter <***@znc.in> ---
Just an explanation for that debug info:
There are no replies that XCB already received, but did not return to any
caller. Some events were received, but not yet returned to the application
(nothing called xcb_poll/wait_for_event()). The empty "pending_replies" list
means that no answer to checked requests is pending (hm? Is this the thread
where libX11 owns the socket?).

Besides that, I don't know. I'd blame the X server for not answering and the
non-X11 threads don't tell me anything.
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
b***@freedesktop.org
2018-09-23 20:53:47 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93414

--- Comment #6 from Sergey Alirzaev <***@gmail.com> ---
Seems like i'm hit by the same bug. In my case, it hangs in OpenGL games
(glxgears seems to be unaffected) randomly when i switch workspaces. With
StarRuler2 i only have one thread hanging while communicating with X, and
here's the backtrace:

#0 0x00007ffff5d05db3 in poll () from /lib64/libc.so.6
#1 0x00007ffff55769e2 in _xcb_conn_wait () from
/usr/lib64/libxcb.so.1
#2 0x00007ffff55789b7 in wait_for_reply () from
/usr/lib64/libxcb.so.1
#3 0x00007ffff5578b71 in xcb_wait_for_reply64 () from
/usr/lib64/libxcb.so.1
#4 0x00007ffff7a3e288 in _XReply () from
/usr/lib64/libX11.so.6
#5 0x00007ffff7f891c7 in DRI2GetBuffersWithFormat
(dpy=0x555555a6c410, drawable=163577862, width=***@entry=0x555555e20a28,
height=***@entry=0x555555e20a2c, attachments=0x7fffffffd620, count=1,
outCount=0x7fffffffd69c)
at ../mesa-18.2.0/src/glx/dri2.c:485
#6 0x00007ffff7f89596 in dri2GetBuffersWithFormat
(driDrawable=<optimized out>, width=0x555555e20a28, height=0x555555e20a2c,
attachments=<optimized out>, count=<optimized out>, out_count=0x7fffffffd69c,
loaderPrivate=0x555555e20c80)
at ../mesa-18.2.0/src/glx/dri2_glx.c:900
#7 0x00007fffeb3b57ef in intel_query_dri2_buffers
(brw=***@entry=0x555555c6c1b0, drawable=***@entry=0x555555e20a00,
buffers=***@entry=0x7fffffffd6a0,
buffer_count=***@entry=0x7fffffffd69c)
at
../mesa-18.2.0/src/mesa/drivers/dri/i965/brw_context.c:1531
#8 0x00007fffeb3b592f in intel_update_dri2_buffers
(brw=***@entry=0x555555c6c1b0, drawable=***@entry=0x555555e20a00) at
../mesa-18.2.0/src/mesa/drivers/dri/i965/brw_context.c:1370
#9 0x00007fffeb3b5b3d in intel_update_renderbuffers
(context=***@entry=0x555555c660a0, drawable=***@entry=0x555555e20a00)
at ../mesa-18.2.0/src/mesa/drivers/dri/i965/brw_context.c:1426
#10 0x00007fffeb3b5baf in intel_prepare_render
(brw=***@entry=0x555555c6c1b0) at
../mesa-18.2.0/src/mesa/drivers/dri/i965/brw_context.c:1445
#11 0x00007fffeb3b079e in brw_clear (ctx=0x555555c6c1b0,
mask=272) at ../mesa-18.2.0/src/mesa/drivers/dri/i965/brw_clear.c:254
#12 0x00007fffeb5b1fe7 in clear (no_error=false, mask=16640,
ctx=0x555555c6c1b0) at ../mesa-18.2.0/src/mesa/main/clear.c:221
#13 _mesa_Clear (mask=<optimized out>) at
../mesa-18.2.0/src/mesa/main/clear.c:242
#14 0x00005555557d77c9 in
render::GLDriver::setRenderTarget(render::Texture*, bool) ()
#15 0x000055555582ecdb in renderFrame(scripts::Manager*,
scripts::Manager*) ()
#16 0x0000555555813eed in tickMenu() ()
#17 0x00005555555a842c in main ()

Can i do anything about it?
--
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
b***@freedesktop.org
2018-09-24 05:48:34 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=93414

--- Comment #7 from Sergey Alirzaev <***@gmail.com> ---
Seems like it is fixed if i build xf86-video-intel with --with-default-dri=3
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
Loading...