Discussion:
problem with xcb-based Xlib and multithreaded applications
(too old to reply)
Francesco Abbate
2011-01-04 10:25:29 UTC
Permalink
Hi all,

I'm working since some time in a multithreaded application, GSL shell,
where multiples threads can perform operations on the same window. I'm
using the standard Xlib library but, as all of you certainly knows,
many modern linux distributions use the Xlib implementation based on
xcb.

My application adopt a simple scheme to avoid collisions between
threads or race conditions. A lock per window is used, only one thread
polls and treats the incoming events. Other secondary threads can
perform drawing operations on the same window but the locks ensures
that such operations are performed only when the main thread is
waiting for events in a XNextEvent calls. I precise that I call
XInitThreads before opening any connections.

When the xcb-based implementation of libX11 was introduced my
applications began to hang. After investigations it was clear that the
problem was coming from the xcb implementation of X11 that does not
tolerate that a second thread perform drawing requests when another
threads is polling the events using the same connection.

As a workaround for this problem I've introduced two X connection, one
is used by the main threads to poll the events and perform the related
actions while the other connection is used only by other threads when
they want to perform drawing operations. The locks were still used as
before to avoid conflicts between the threads.

Since then I've experienced another problem, it seems that when I
close the X connections I can get a BadDrawable error because some
drawing operations can still be present in the output buffer. It seems
that a solution for this problem is:
- call XSync on the drawing connections to flush all the events and,
only after, close the connection
- perform the same operations, namely XSync + XCloseDisplay, on the
main x connections, the one used by the thread that manage the
incoming X events.

In this way I make sure that all the outgoing drawing operations are
treated and the output buffer is empy when the drawing and main
connections are closed.

My questions are, is it normal that I need to call XSync before
XCloseDisplay to avoid a BadDrawable error ? Is this related to the
xcb implementation on libX11 ? Is this related to the usage of two
separate connections ?

I hope people in this list can help me. In general I'm experiencing
problems like this one since the introduction on a xcb-based libX11.
For the other side the Windows client was working always flawlessy in
all the Windows versions that I've managed to test. I'm wondering if
the xcb introduction to replace a mature library like libX11 was not
done without the necessary verifications required to support existing
multi-threaded applications. This sound ironic to me since xcb is
supposed to be great for multi-threaded applications!! :-)

I hope that, beside my particular problem, the xcb main developers
will say a few words about this kind of problems with multithreaded
applications. I hope that you understand that for many applications is
very difficult to adopt the xcb library directly because it is a low
level library and many higher level functions provided by libX11 are
simply not available. Since an higher level layer around xcb is not
available I believe that many programmers needs to use libX11 and so
you need to ensure that its xcb based implementation is 100%
compliant.

Thanks in advance for your help.

Best regards,
Francesco Abbate
Uli Schlachter
2011-01-04 11:05:57 UTC
Permalink
Hi,
Post by Francesco Abbate
My application adopt a simple scheme to avoid collisions between
threads or race conditions. A lock per window is used, only one thread
polls and treats the incoming events. Other secondary threads can
perform drawing operations on the same window but the locks ensures
that such operations are performed only when the main thread is
waiting for events in a XNextEvent calls. I precise that I call
XInitThreads before opening any connections.
Uhm, what do you need threads for? Only one thread can be active at any given
time with such a looking scheme, no?
Post by Francesco Abbate
When the xcb-based implementation of libX11 was introduced my
applications began to hang. After investigations it was clear that the
problem was coming from the xcb implementation of X11 that does not
tolerate that a second thread perform drawing requests when another
threads is polling the events using the same connection.
attached is an ugly (e.g. drawing without getting an Expose) test case. The main
thread just does XNextEvent and prints all events while a thread does some
drawing via XFillRectangle every second. My libX11 uses xcb:

$ ldd /usr/lib/libX11.so | grep xcb
libxcb.so.1 => /usr/lib/libxcb.so.1 (0x00007f5b9b3ac000)

However, I don't see any hangs from this. It all seems to work fine. What am I
doing wrong that makes this work correctly?

Cheers,
Uli

- --
The Angels have the phone box!
Francesco Abbate
2011-01-04 13:38:43 UTC
Permalink
Hi Uli,

thank you very much for taking the time to look at my problem, I
appreciate a lot.
Post by Uli Schlachter
Uhm, what do you need threads for? Only one thread can be active at any given
time with such a looking scheme, no?
For my applications it is needed because the main thread (thread #1)
treats the incoming events to refresh the window as needed. The thread
#2 does take the input from the user from the TTY, does execute the
user commands by using a scripting language (Lua) and can, if
requested, draw new elements into the graphical windows. So the two
threads are needed because the thread #2 can be busy while executing a
script and at the same time we need the window to remain responsive to
Configure/Expose events at least, so a separate thread is needed.
Post by Uli Schlachter
attached is an ugly (e.g. drawing without getting an Expose) test case. The main
thread just does XNextEvent and prints all events while a thread does some
$ ldd /usr/lib/libX11.so | grep xcb
       libxcb.so.1 => /usr/lib/libxcb.so.1 (0x00007f5b9b3ac000)
However, I don't see any hangs from this. It all seems to work fine. What am I
doing wrong that makes this work correctly?
Well, you example seems to be pertinent. The only objection I can
think about is that I'm using XPutImage instead of XFillRectangle,
like you are doing, and I was using XSync with Discard to True.

Anyway, for the moment I cannot do any test but this evening I will
use your test program to try it and see if I can modify it to
reproduce the bug. I will give you a feedback soon, thank you for your
help!

Francesco
--
Francesco
Uli Schlachter
2011-01-04 14:08:57 UTC
Permalink
Post by Francesco Abbate
Post by Uli Schlachter
attached is an ugly (e.g. drawing without getting an Expose) test case. The main
thread just does XNextEvent and prints all events while a thread does some
$ ldd /usr/lib/libX11.so | grep xcb
libxcb.so.1 => /usr/lib/libxcb.so.1 (0x00007f5b9b3ac000)
However, I don't see any hangs from this. It all seems to work fine. What am I
doing wrong that makes this work correctly?
Well, you example seems to be pertinent. The only objection I can
think about is that I'm using XPutImage instead of XFillRectangle,
like you are doing, and I was using XSync with Discard to True.
I just made this use XSync(d, True) and managed to make this hang (it worked for
a moment at first). Don't ask me how I managed to do this, it didn't hang on the
second try.
Then I removed all the "sleep(1);" which makes it freeze almost immediately,
even with Discard=False.

Anyway, here are some backtraces:

(gdb) thread 1
[Switching to thread 1 (Thread 0x7f65d777e700 (LWP 9878))]#0 0x00007f65d702846e
in __pthread_mutex_unlock_usercnt (mutex=0x240a6d8, decr=<value optimized out>)
at pthread_mutex_unlock.c:52
52 in pthread_mutex_unlock.c
(gdb) bt
#0 0x00007f65d702846e in __pthread_mutex_unlock_usercnt (mutex=0x240a6d8,
decr=<value optimized out>) at pthread_mutex_unlock.c:52
#1 0x00007f65d6aac89c in xcb_poll_for_reply () from /usr/lib/libxcb.so.1
#2 0x00007f65d7288ea7 in process_responses (dpy=0x2409070,
wait_for_first_event=<value optimized out>, current_error=<value optimized out>,
current_request=0)
at ../../src/xcb_io.c:222
#3 0x00007f65d7289941 in _XReadEvents (dpy=0x2409070) at ../../src/xcb_io.c:279
#4 0x00007f65d726fdd8 in XNextEvent (dpy=0x2409070, event=0x7fffbb63d3d0) at
../../src/NextEvent.c:51
#5 0x0000000000400d7d in main () at t.c:72
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f65d6494710 (LWP 9879))]#0 __lll_lock_wait ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
136 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: Datei oder
Verzeichnis nicht gefunden.
in ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
Current language: auto
The current source language is "auto; currently asm".
(gdb) bt
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1 0x00007f65d70270e9 in _L_lock_953 () from /lib/libpthread.so.0
#2 0x00007f65d7026f0b in __pthread_mutex_lock (mutex=0x240a500) at
pthread_mutex_lock.c:61
#3 0x00007f65d726ebf3 in _XLockDisplay (dpy=0x2409070) at ../../src/locking.c:458
#4 0x00007f65d725ab12 in XChangeGC (dpy=0x240a500, gc=0x24155a0, valuemask=4,
values=0x7f65d6493e10) at ../../src/ChGC.c:40
#5 0x0000000000400b11 in draw (gc=0x24155a0, color=0) at t.c:21
#6 0x0000000000400bb1 in threadf (unused=0x0) at t.c:31
#7 0x00007f65d70248ba in start_thread (arg=<value optimized out>) at
pthread_create.c:300
#8 0x00007f65d6d8c02d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#9 0x0000000000000000 in ?? ()

(According to strace, thread 2 hangs in futex() while thread 1 is in userspace.
Top says this thread does a busy loop)

Now find someone who has a clue about Xlib and can figure out why this hangs, I
can only speculate.

Uli

- --
- - Buck, when, exactly, did you lose your mind?
- - Three months ago. I woke up one morning married to a pineapple.
An ugly pineapple... But I loved her.
Francesco Abbate
2011-01-04 16:29:51 UTC
Permalink
Post by Uli Schlachter
I just made this use XSync(d, True) and managed to make this hang (it worked for
a moment at first). Don't ask me how I managed to do this, it didn't hang on the
second try.
Then I removed all the "sleep(1);" which makes it freeze almost immediately,
even with Discard=False.
Thank you Uli, we have easily obtained a perfect test case for this bug :-)
Post by Uli Schlachter
Now find someone who has a clue about Xlib and can figure out why this hangs, I
can only speculate.
I hope we will get some help from the people that know the
implementation details of the Xlib.

Otherwise, I would like to remember that, in my email, I'm talking
also about a second error, a BadDrawable X error that I can get if I
do XCloseDisplay in a connection that still have some outgoing drawing
events. I can workaround this problem with a XSync before the
XCloseDisplay but I'm wondering if it is a normal behaviour or it is a
bug, and if it is related to the xcb implementation.

When these two problems will be solved may be I will give the details
about a memory leak introduced by the xcb-based Xlib implementation,
but this is a minor problem.
--
Francesco
Uli Schlachter
2011-01-04 18:06:25 UTC
Permalink
Post by Francesco Abbate
Post by Uli Schlachter
I just made this use XSync(d, True) and managed to make this hang (it worked for
a moment at first). Don't ask me how I managed to do this, it didn't hang on the
second try.
Then I removed all the "sleep(1);" which makes it freeze almost immediately,
even with Discard=False.
Thank you Uli, we have easily obtained a perfect test case for this bug :-)
This is getting off-topic, does someone know the right mailing list for this and
can forward it?


I spent the better half of an afternoon starring at libX11's code. Here is what
happens (src/xcb_io.c in libX11, around line 187)

unsigned long event_sequence = dpy->last_request_read;
if(event)
widen(&event_sequence, event->full_sequence);

The second threads does lots of GetInputFocus requests. By the time an event
comes in, dpy->last_request_read is way beyond the event's sequence number
because of the second thread.

E.g. I was seeing dpy->last_request_read = 0x3ab and
event_sequence = event->full_sequence = 0x121

widen() then turns this into event_sequence = 0x100000121. Because obviously
0x100000121 > 0x3ab, this code goes into a busy loop.

Just removing the call to widen() makes the hang go away for me. Since it
depends on sizeof(unsigned long) > 32, I guess that this only happens on 64bit
systems.
However, removing widen() will make this hang on 32bit sequence number wraps.
Perhaps someone will come up with a good idea to solve this.

Cheers,
Uli

- --
The Angels have the phone box!
Alan Coopersmith
2011-01-04 18:11:15 UTC
Permalink
Post by Uli Schlachter
Post by Francesco Abbate
Post by Uli Schlachter
I just made this use XSync(d, True) and managed to make this hang (it worked for
a moment at first). Don't ask me how I managed to do this, it didn't hang on the
second try.
Then I removed all the "sleep(1);" which makes it freeze almost immediately,
even with Discard=False.
Thank you Uli, we have easily obtained a perfect test case for this bug :-)
This is getting off-topic, does someone know the right mailing list for this and
can forward it?
This is the best mailing list for discussion of libxcb and the xcb code in
libX11 (src/xcb_io.c etal.)
--
-Alan Coopersmith- alan.coopersmith-QHcLZuEGTsvQT0dZR+***@public.gmane.org
Oracle Solaris Platform Engineering: X Window System
Loading...