Discussion:
[Bug 27552] Lots of processes crash in XCreatePixmap() with
bugzilla-daemon-CC+
2010-04-10 00:35:05 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=27552

Jamey Sharp <jamey-sH+B+***@public.gmane.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |xcb-***@public.gmane.org

--- Comment #1 from Jamey Sharp <jamey-sH+B+***@public.gmane.org> 2010-04-09 17:35:03 PDT ---
It would be really nice to know where the last call to XIDAlloc was before the
one that fails, and for that I'd love to have a minimal test case.

Any chance these applications are using libX11 from more than one thread? I
notice in some stack traces, like
<http://launchpadlibrarian.net/31837770/ThreadStacktrace.txt> that a second
thread is waiting in libpulse; and in
<http://launchpadlibrarian.net/31032259/ThreadStacktrace.txt>, g_main_loop_run
is in a different thread than the cairo call that precedes the assert.

I could believe there's a race in how Xlib asks XCB for XIDs, but I'd like to
know if that would be a plausible explanation for the reports you're actually
seeing. It looks like a fair number of the reported stack traces are
single-threaded, making that an unlikely explanation for those reports. :-)
It's possible there are two separate bugs here, I suppose...

In lieu of a minimal test, perhaps you can tell me which X extensions these
applications have in common? Perhaps the client library for some extension
allocates an XID but is missing a SyncHandle call; if the next X request also
tries to allocate an XID you'd get this assertion failure.

I've just reviewed everything in xorg/lib, just in case, for bad use of
SyncHandle, XAllocID, or XAllocIDs. I found quite a few things to quibble with,
but nothing that seems likely to have caused these reports. I've pushed fixes
in these modules, so you can check whether anything there seems plausible.

libXcomposite/
libXdamage/
libXext/
libXfixes/
libXi/
libXrender/
libXTrap/

In short, I don't have a good hypothesis yet.
http://launchpadlibrarian.net/23717282/Stacktrace.txt
I'm ignoring this one: the application was exiting, and a destructor tried to
access the Display to free a resource. I'd guess the display was already gone
at that point. I've seen that kind of pattern in quite a few Qt stack traces.
--
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
bugzilla-daemon-CC+
2010-04-15 12:48:01 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=27552

--- Comment #2 from Martin Pitt <martin.pitt-GeWIH/nMZzLQT0dZR+***@public.gmane.org> 2010-04-15 05:48:01 PDT ---
(In reply to comment #1)
Post by bugzilla-daemon-CC+
It would be really nice to know where the last call to XIDAlloc was before the
one that fails, and for that I'd love to have a minimal test case.
Indeed; I'm afraid I can't say more about this myself, but I asked for a
reproduction recipe on the downstream bug a few days ago. Once we have a way to
reproduce this, we can strip that down to a test case.
Post by bugzilla-daemon-CC+
Any chance these applications are using libX11 from more than one thread?
Some certainly do, but amongst the many duplicates are a lot of progams which
are single-threaded; for example services-admin (from gnome-system-tools,
LP#432122) or gigolo (LP#468544)
Post by bugzilla-daemon-CC+
It looks like a fair number of the reported stack traces are
single-threaded, making that an unlikely explanation for those reports. :-)
It's possible there are two separate bugs here, I suppose...
Possibly. It seems that about half of the reports are in gnome-panel and
firefox, but since they are also much more popular than the apps mentioned
above that kind of bias is expected.
Post by bugzilla-daemon-CC+
In lieu of a minimal test, perhaps you can tell me which X extensions these
applications have in common? Perhaps the client library for some extension
allocates an XID but is missing a SyncHandle call; if the next X request also
tries to allocate an XID you'd get this assertion failure.
The bugs unfortunately don't have Xorg logs, but I don't think that they will
differ that much between different installations. I searched Launchpad for a
few Xorg logs for various drivers, and in fact for a given graphics driver the
extension lists were exactly identical (I took samples of about 5 per driver):

common for intel, ati, fglrx, nvidia: MIT-SCREEN-SAVER XFree86-VidModeExtension
XFree86-DGA DPMS XVideo XVideo-MotionCompensation X-Resource DOUBLE-BUFFER GLX
RECORD XFree86-DRI DRI2

extra extensions for fglrx: ATIFGLRXDRI FGLRXEXTENSION GLESX AMDXVOPL (I
suppose those are uninteresting)

I asked in the downstram bug for a few Xorg logs from those people who can
reproduce the crash. Perhaps it will turn out that they are all using the same
graphics driver.
--
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
bugzilla-daemon-CC+
2010-04-15 16:58:31 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=27552

--- Comment #3 from Jamey Sharp <jamey-sH+B+***@public.gmane.org> 2010-04-15 09:58:31 PDT ---
(In reply to comment #2)
Post by bugzilla-daemon-CC+
Post by bugzilla-daemon-CC+
In lieu of a minimal test, perhaps you can tell me which X extensions these
applications have in common? Perhaps the client library for some extension
allocates an XID but is missing a SyncHandle call; if the next X request also
tries to allocate an XID you'd get this assertion failure.
I asked in the downstram bug for a few Xorg logs from those people who can
reproduce the crash. Perhaps it will turn out that they are all using the same
graphics driver.
I don't expect it to be a server-side issue, but then there was a recent report
that turned out to be a byte-swapping bug in the server, and I didn't expect it
then either... :-)

I meant, which extensions are these applications actually using? I'm hoping to
pin the blame on a client-side extension library rather than libX11 itself. ;-)
The list of loaded libraries from GDB would be a start. Is there some Launchpad
way to download all the attachments from all the public duplicate bugs at once?

I don't have very high hopes though, especially after I went through checking
for SyncHandle bugs in every extension library I could find, last week.

Regardless, I hope your effort to get more information from downstream bug
reporters works out.
--
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
bugzilla-daemon-CC+
2010-04-16 08:58:10 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=27552

--- Comment #4 from Martin Pitt <martin.pitt-GeWIH/nMZzLQT0dZR+***@public.gmane.org> 2010-04-16 01:58:10 PDT ---
Just to collect more puzzle pieces, I got three replies by now about my "which
driver" question, and these three are all using the proprietary NVIDIA driver.
It might be just pure luck with a sample size of three, of course, so I asked
whether anyone who got this crash has a different card/driver.

(In reply to comment #3)
Post by bugzilla-daemon-CC+
I meant, which extensions are these applications actually using? I'm hoping to
pin the blame on a client-side extension library rather than libX11 itself. ;-)
The list of loaded libraries from GDB would be a start.
We routinely collect /proc/<pid>/maps in crash reports, they all have a
"ProcMaps.txt" attachment. However, this just says which libraries the program
dynamically links against, not which extensions it's actually using, right?

Is there a better way to see which extensions a program is currently using?
Some command that I could ask the bug reporters to run? We could even add that
to our standard Apport hooks, so that it's collected for all crash reports.
Post by bugzilla-daemon-CC+
Is there some Launchpad way to download all the attachments from all the public duplicate bugs at once?
Bryce Harrington (CC added) has put together a lot of scripts for these
purposes (X.org bug triaging). Bryce, do you have a script like that?

Anyway, it's really easy using the Python API, so for this particular purpose I
wrote a quick hack which downloads a particular attachment name from a bug and
all its duplicates and dumps them to stdout:

$ ./cat-bugattachment.py 507062 ProcMaps.txt
---- 507062 ---
00110000-001ae000 r-xp 00000000 08:05 526394
/usr/lib/libgdk-x11-2.0.so.0.1902.0
[...]
---- 340441 ---
00400000-00408000 r-xp 00000000 09:01 94936
/usr/lib/firefox-3.0.7/firefox
[...]

and so on. The script takes the master bug number and an attachment name, I put
it on http://people.canonical.com/~pitti/scripts/cat-bugattachment.py for now.

When I do some simple post-processing of the output of above command (which I
directed into file "out") I get this:

$ grep lib/libX out | awk '{print $6}' | sort -u
/usr/lib/libX11.so.6.2.0
/usr/lib/libX11.so.6.3.0
/usr/lib/libXau.so.6.0.0
/usr/lib/libXcomposite.so.1.0.0
/usr/lib/libXcursor.so.1.0.2
/usr/lib/libXdamage.so.1.1.0
/usr/lib/libXdmcp.so.6.0.0
/usr/lib/libXext.so.6.4.0
/usr/lib/libXfixes.so.3.1.0
/usr/lib/libXft.so.2.1.13
/usr/lib/libXinerama.so.1.0.0
/usr/lib/libXi.so.6.0.0
/usr/lib/libXi.so.6.1.0
/usr/lib/libXmu.so.6.2.0
/usr/lib/libXrandr.so.2.2.0
/usr/lib/libXrender.so.1.3.0
/usr/lib/libXRes.so.1.0.0
/usr/lib/libXss.so.1.0.0
/usr/lib/libXt.so.6.0.0
/usr/lib/libXtst.so.6.1.0
/usr/lib/libXxf86misc.so.1.1.0
/usr/lib/libXxf86vm.so.1.0.0

Not sure how helpful that actually is, though, since those are still a fair
number of extensions.

Thank you!
--
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
bugzilla-daemon-CC+
2010-04-16 09:09:29 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=27552

--- Comment #5 from Martin Pitt <martin.pitt-GeWIH/nMZzLQT0dZR+***@public.gmane.org> 2010-04-16 02:09:29 PDT ---
(In reply to comment #4)
Post by bugzilla-daemon-CC+
Just to collect more puzzle pieces, I got three replies by now about my "which
driver" question, and these three are all using the proprietary NVIDIA driver.
It might be just pure luck with a sample size of three, of course, so I asked
whether anyone who got this crash has a different card/driver.
Got another reply that someone is using Intel, so let's discard the "driver
dependent" theory for now.
Post by bugzilla-daemon-CC+
$ grep lib/libX out | awk '{print $6}' | sort -u
Ah, BS, sorry. We want the intersection, not the union.

$ sed -n '/^-/ p; /lib\/libX/ { s/^.*usr.lib.//; s/\.so\..*$//; p}'
out|uniq|less

By inspection, it seems that those bugs have only a very small set:

---- 528503 ---
libXfixes
libXrender
libXcursor
libXdmcp
libXau
libX11
---- 558407 ---
libXt
libXext
libX11
libXmu
libXau
libXdmcp

So I believe the common set is libXdmcp, libXau, and libX11 itself.
--
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
bugzilla-daemon-CC+
2010-06-21 16:13:45 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=27552

--- Comment #6 from Joris Guisson <joris.guisson-***@public.gmane.org> 2010-06-21 09:13:45 PDT ---
The same problem has been reported for ktorrent:

https://bugs.kde.org/show_bug.cgi?id=242306
--
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
bugzilla-daemon-CC+
2010-06-21 18:19:07 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=27552

Jamey Sharp <jamey-sH+B+***@public.gmane.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |joris.guisson-***@public.gmane.org

--- Comment #7 from Jamey Sharp <jamey-sH+B+***@public.gmane.org> 2010-06-21 11:19:06 PDT ---
(In reply to comment #6)
Post by bugzilla-daemon-CC+
https://bugs.kde.org/show_bug.cgi?id=242306
In that bug report, you wrote, "Nothing we can do about, the X people have to
figure this one out." You could try providing information I've asked for in
this bug. Above all, if I have a minimal test application that demonstrates the
bug, I'm likely to be able to fix it quickly.

As things stand now, I can't reproduce the problem and I have no reasonable
hypotheses about its cause, so it is unlikely that I'll be able to fix it.

One thing I hadn't asked for yet that might help: Run a failing application
under `xtrace` and attach the log to this bug.
--
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
bugzilla-daemon-CC+
2010-06-22 06:35:57 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=27552

--- Comment #8 from Pauli <suokkos-***@public.gmane.org> 2010-06-21 23:35:57 PDT ---
Has anyone tried to run any of multi-threaded applications that are crashing in
hellgrind? That would give pretty fast conclusion if the bug is about locking
problems in those cases.
--
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
bugzilla-daemon-CC+
2010-08-12 10:47:01 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=27552

--- Comment #9 from hules <hules-***@public.gmane.org> 2010-08-12 03:47:00 PDT ---
I have a small application that manages to reproduce that crash quite easily on
ubuntu 10.04, but only when running under valgrind. But that application itself
behaves quite badly as it calls xlib from different threads using the same
"Display*" struct for all threads, so I'm not sure if that might be of any use
for tracking this issue. However I think it calls XInitThreads() on startup, so
that should allow it to share the same display for all threads, isn't it ?

hellgrind outputs loads of stuff like:

==2950== Possible data race during write of size 4 at 0x44bd06c by thread #1
==2950== at 0x4084764: ??? (in /usr/lib/libX11.so.6.3.0)
==2950== by 0x4069352: XGetWindowProperty (in /usr/lib/libX11.so.6.3.0)
==2950== by 0x81E0F46: juce::LinuxComponentPeer::isMinimised() const
(juce_linux_Windowing.cpp:902)
==2950== by 0x8164890: juce::ComponentPeer::handleMovedOrResized()
(juce_amalgamated.cpp:71167)
==2950== by 0x81E296B:
juce::LinuxComponentPeer::handleWindowMessage(_XEvent*)
(juce_linux_Windowing.cpp:1579)
==2950== by 0x81E549F: juce::juce_windowMessageReceive(_XEvent*)
(juce_linux_Windowing.cpp:2617)
==2950== by 0x81DE71F: juce::juce_dispatchNextXEvent()
(juce_linux_Messaging.cpp:384)
==2950== by 0x81DE899: juce::juce_dispatchNextMessageOnSystemQueue(bool)
(juce_linux_Messaging.cpp:431)
==2950== by 0x8137170: juce::MessageManager::dispatchNextMessage(bool,
bool*) (juce_amalgamated.cpp:34719)
==2950== by 0x8137204: juce::MessageManager::runDispatchLoop()
(juce_amalgamated.cpp:34763)
==2950== by 0x81348A1: juce::JUCEApplication::main(juce::StringArray&,
juce::JUCEApplication*) (juce_amalgamated.cpp:14770)
==2950== by 0x8134A7B: juce::JUCEApplication::main(int, char**,
juce::JUCEApplication*) (juce_amalgamated.cpp:14859)
==2950== This conflicts with a previous read of size 4 by thread #6
==2950== at 0x4082F0C: _XSetPrivSyncFunction (in /usr/lib/libX11.so.6.3.0)
==2950== by 0x408BBEF: _XAllocID (in /usr/lib/libX11.so.6.3.0)
==2950== by 0x40611A7: XCreatePixmap (in /usr/lib/libX11.so.6.3.0)
==2950== by 0x50BF8ED: XcursorImageLoadCursor (in
/usr/lib/libXcursor.so.1.0.2)
==2950== by 0x50C0765: XcursorImagesLoadCursors (in
/usr/lib/libXcursor.so.1.0.2)
==2950== by 0x50C0886: XcursorImagesLoadCursor (in
/usr/lib/libXcursor.so.1.0.2)
==2950== by 0x50C4187: XcursorTryShapeCursor (in
/usr/lib/libXcursor.so.1.0.2)
==2950== by 0x4060C5C: XCreateGlyphCursor (in /usr/lib/libX11.so.6.3.0)

before the failed assertion.
--
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
bugzilla-daemon-CC+
2010-08-12 16:10:59 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=27552

--- Comment #10 from Jamey Sharp <jamey-sH+B+***@public.gmane.org> 2010-08-12 09:10:59 PDT ---
(In reply to comment #9)
Post by bugzilla-daemon-CC+
I have a small application that manages to reproduce that crash quite easily on
ubuntu 10.04, but only when running under valgrind.
Cool, what's the application? Can I get the source somewhere? I see it's using
a toolkit (juce) that I hadn't heard of before though, so I'm not sure it's
close to a minimal test case yet...
Post by bugzilla-daemon-CC+
==2950== Possible data race during write of size 4 at 0x44bd06c by thread #1
==2950== at 0x4084764: ??? (in /usr/lib/libX11.so.6.3.0)
==2950== by 0x4069352: XGetWindowProperty (in /usr/lib/libX11.so.6.3.0)
...
==2950== This conflicts with a previous read of size 4 by thread #6
==2950== at 0x4082F0C: _XSetPrivSyncFunction (in /usr/lib/libX11.so.6.3.0)
==2950== by 0x408BBEF: _XAllocID (in /usr/lib/libX11.so.6.3.0)
==2950== by 0x40611A7: XCreatePixmap (in /usr/lib/libX11.so.6.3.0)
...
before the failed assertion.
I'm guessing that's the bug we fixed in git commit
a6d974dc59f2722b36e2df9d4f07aeee4f83ce43, which is included in libX11 1.3.4 and
later. What version of libX11 do you have?
Post by bugzilla-daemon-CC+
But that application itself
behaves quite badly as it calls xlib from different threads using the same
"Display*" struct for all threads, so I'm not sure if that might be of any use
for tracking this issue. However I think it calls XInitThreads() on startup, so
that should allow it to share the same display for all threads, isn't it ?
Yes, that's supposed to be safe. However, I introduced a variety of bugs for
multi-threaded apps some time ago, and only got most of them fixed in libX11
1.3.4. You should definitely re-test with that version or later if you don't
already have it.
--
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
bugzilla-daemon-CC+
2010-08-13 09:52:52 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=27552

--- Comment #11 from hules <hules-***@public.gmane.org> 2010-08-13 02:52:51 PDT ---
I have put the sources for this test here:
http://hules.free.fr/xcbbug.tbz2

JUCE is a small cross-platform toolkit, but this is definitively not a minimal
test case as there are ~250000 lines of code.. The test application is not
using it properly with respect to multithreading so if you run it in helgrind
it will spit a lot of warnings. But when run in valgrind or helgrind, it ends
up 75% of time with the _XallocID assert failed.

I have the libx11 of ubuntu 10.04, that is libx11 1.3.2. I'm not sure to
proceed to test 1.3.4
--
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
bugzilla-daemon-CC+
2011-10-09 03:06:03 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=27552

Jeremy Huddleston <jeremyhu-CC+***@public.gmane.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status Whiteboard| |2011BRB_Reviewed
AssignedTo|xorg-team-go0+***@public.gmane.org |jamey-sH+B+***@public.gmane.org
CC| |jeremyhu-CC+***@public.gmane.org

--- Comment #12 from Jeremy Huddleston <jeremyhu-CC+***@public.gmane.org> 2011-10-08 20:06:02 PDT ---
Dupe of #23690?
--
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
b***@freedesktop.org
2018-08-10 20:09:41 UTC
Permalink
https://bugs.freedesktop.org/show_bug.cgi?id=27552

GitLab Migration User <gitlab-***@fdo.invalid> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |MOVED

--- Comment #13 from GitLab Migration User <gitlab-***@fdo.invalid> ---
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been
closed from further activity.

You can subscribe and participate further through the new bug through this link
to our GitLab instance:
https://gitlab.freedesktop.org/xorg/lib/libx11/issues/10.
--
You are receiving this mail because:
You are on the CC list for the bug.
Loading...