Discussion:
[Xcb] [PATCH] _xcb_conn_wait(): Do better at detecting closed sockets
Adam Jackson
2016-10-13 15:02:46 UTC
Permalink
Currently, when the X server crashes or a client is disconnected with
XKillClient, you get a somewhat confusing error message from libX11
along the lines of:

XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
after 98 requests (40 known processed) with 0 events remaining.

What's happening here is the previous recvmsg has thrown EAGAIN, and
since errno is not cleared on success that's the errno that the I/O
error handler sees.

To fix this, check for POLLHUP explicitly, and if there's no more data
in the socket buffer to read, treat that as an error. We coerce errno to
EPIPE in this case, which triggers the existing EPIPE path in libX11 and
thus generates the much more honest error message:

X connection to :0 broken (explicit kill or server shutdown).

Signed-off-by: Adam Jackson <***@redhat.com>
---
src/xcb_conn.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/xcb_conn.c b/src/xcb_conn.c
index 7d09637..ad94443 100644
--- a/src/xcb_conn.c
+++ b/src/xcb_conn.c
@@ -45,6 +45,7 @@
#elif !defined _WIN32
#include <sys/select.h>
#endif
+#include <sys/ioctl.h>

#ifdef _WIN32
#include "xcb_windefs.h"
@@ -451,7 +452,7 @@ int _xcb_conn_wait(xcb_connection_t *c, pthread_cond_t *cond, struct iovec **vec
#if USE_POLL
memset(&fd, 0, sizeof(fd));
fd.fd = c->fd;
- fd.events = POLLIN;
+ fd.events = POLLIN | POLLHUP;
#else
FD_ZERO(&rfds);
FD_SET(c->fd, &rfds);
@@ -484,6 +485,19 @@ int _xcb_conn_wait(xcb_connection_t *c, pthread_cond_t *cond, struct iovec **vec
ret = -1;
break;
}
+
+ /* hangup with no data left to read is an error */
+ if (fd.revents & POLLHUP)
+ {
+ int unread = -1;
+ ioctl(c->fd, FIONREAD, &unread);
+ if (unread <= 0)
+ {
+ /* coerce errno to not be EAGAIN */
+ errno = EPIPE;
+ ret = -1;
+ }
+ }
#else
ret = select(c->fd + 1, &rfds, &wfds, 0, 0);
#endif
--
2.9.3
Ben Hildred
2016-10-13 15:19:04 UTC
Permalink
Post by Adam Jackson
Currently, when the X server crashes or a client is disconnected with
XKillClient, you get a somewhat confusing error message from libX11
XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
after 98 requests (40 known processed) with 0 events remaining.
I had been wondering about a similar although less verbose message from
gvim during server crash.
Post by Adam Jackson
What's happening here is the previous recvmsg has thrown EAGAIN, and
since errno is not cleared on success that's the errno that the I/O
error handler sees.
excellent troubleshooting.
To fix this, check for POLLHUP explicitly, and if there's no more data
in the socket buffer to read, treat that as an error. We coerce errno to
EPIPE in this case, which triggers the existing EPIPE path in libX11 and
X connection to :0 broken (explicit kill or server shutdown).
---
src/xcb_conn.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/src/xcb_conn.c b/src/xcb_conn.c
index 7d09637..ad94443 100644
--- a/src/xcb_conn.c
+++ b/src/xcb_conn.c
@@ -45,6 +45,7 @@
#elif !defined _WIN32
#include <sys/select.h>
#endif
+#include <sys/ioctl.h>
#ifdef _WIN32
#include "xcb_windefs.h"
@@ -451,7 +452,7 @@ int _xcb_conn_wait(xcb_connection_t *c,
pthread_cond_t *cond, struct iovec **vec
#if USE_POLL
memset(&fd, 0, sizeof(fd));
fd.fd = c->fd;
- fd.events = POLLIN;
+ fd.events = POLLIN | POLLHUP;
#else
FD_ZERO(&rfds);
FD_SET(c->fd, &rfds);
@@ -484,6 +485,19 @@ int _xcb_conn_wait(xcb_connection_t *c,
pthread_cond_t *cond, struct iovec **vec
ret = -1;
break;
}
+
+ /* hangup with no data left to read is an error */
+ if (fd.revents & POLLHUP)
+ {
+ int unread = -1;
+ ioctl(c->fd, FIONREAD, &unread);
+ if (unread <= 0)
+ {
+ /* coerce errno to not be EAGAIN */
+ errno = EPIPE;
+ ret = -1;
+ }
+ }
#else
ret = select(c->fd + 1, &rfds, &wfds, 0, 0);
#endif
--
2.9.3
_______________________________________________
Xcb mailing list
https://lists.freedesktop.org/mailman/listinfo/xcb
--
--
Ben Hildred
Automation Support Services
303 815 6721
Uli Schlachter
2016-10-13 18:20:14 UTC
Permalink
On 13.10.2016 17:02, Adam Jackson wrote:
[...]
Post by Adam Jackson
@@ -484,6 +485,19 @@ int _xcb_conn_wait(xcb_connection_t *c, pthread_cond_t *cond, struct iovec **vec
ret = -1;
break;
}
+
+ /* hangup with no data left to read is an error */
Hangup with data left to read is not an error? Why?
Post by Adam Jackson
+ if (fd.revents & POLLHUP)
+ {
+ int unread = -1;
+ ioctl(c->fd, FIONREAD, &unread);
+ if (unread <= 0)
+ {
+ /* coerce errno to not be EAGAIN */
+ errno = EPIPE;
+ ret = -1;
+ }
+ }
#else
ret = select(c->fd + 1, &rfds, &wfds, 0, 0);
#endif
The above might work in 80% of cases, but won't work if some data is
left to read. How about the following idea which will work in 90% of cases?

--- xcb_in.c.orig 2016-10-13 20:07:52.464589712 +0200
+++ xcb_in.c 2016-10-13 20:13:31.177710447 +0200
@@ -400,7 +400,15 @@ static int read_block(const int fd, void
#endif /* USE_POLL */
}
if(ret <= 0)
+ {
+ if (ret == 0)
+ /* Xlib assumes that errno is meaningful. That's not
going to
+ * work (e.g. if the connection goes into an error
state at some
+ * user code using xcb directly), but at least try.
+ */
+ errno = EPIPE;
return ret;
+ }
}
return len;
}
@@ -988,6 +996,7 @@ int _xcb_in_read(xcb_connection_t *c)
*/
if (msg.msg_flags & (MSG_TRUNC|MSG_CTRUNC)) {
_xcb_conn_shutdown(c, XCB_CONN_CLOSED_FDPASSING_FAILED);
+ errno = ENO_IDEA_WHAT_TO_DO_HERE;
return 0;
}
#else
@@ -1038,6 +1047,12 @@ int _xcb_in_read(xcb_connection_t *c)
if((n > 0) || (n < 0 && WSAGetLastError() == WSAEWOULDBLOCK))
#endif /* !_WIN32 */
return 1;
+ if (n == 0)
+ /* Xlib assumes that errno is meaningful. That's not going to
+ * work (e.g. if the connection goes into an error state at some
+ * user code using xcb directly), but at least try.
+ */
+ errno = EPIPE;
_xcb_conn_shutdown(c, XCB_CONN_ERROR);
return 0;
}

(Sorry for the line wrapping)
The comments tell you how much I like this idea.

How about fixing this properly in Xlib instead? If it wants to check if
the error was due to "the other end closed the connection", it can do
something like this:

int fd = xcb_get_file_descriptor(conn);
if (fd != -1) {
call select() or poll() or some magic ioctl to check for hung up
} else {
/* Either old libxcb where the above returns -1 on error connection or
the connection always has been broken (should not happen) */
}

This patch should always work while errno-magic depends on the call
chain when the hang up happens.

What do you think?
Uli
--
"Do you know that books smell like nutmeg or some spice from a foreign
land?"
-- Faber in Fahrenheit 451
Adam Jackson
2016-10-13 19:53:44 UTC
Permalink
Post by Uli Schlachter
[...]
Post by Adam Jackson
@@ -484,6 +485,19 @@ int _xcb_conn_wait(xcb_connection_t *c,
pthread_cond_t *cond, struct iovec **vec
             ret = -1;
             break;
         }
+
+ /* hangup with no data left to read is an error */
Hangup with data left to read is not an error? Why?
Surely from a correctness standpoint every reply or event that we've
already received should be processed. Consider a pair of applications
using ClientMessage events for IPC. If one side does:

    XSendEvent();
    XSync();
    XKillClient();

then it would be entirely reasonable to expect that the event has
actually been sent to the other end before it was disconnected. If we
treated POLLHUP-with-data-remaining as an error, we'd likely throw away
an event already enqueued.

It doesn't end up mattering on Linux, because POLLHUP isn't raised
until there's no data left in the read queue. But I'm told there are
other OSes in the world.
Post by Uli Schlachter
Post by Adam Jackson
+ if (fd.revents & POLLHUP)
+ {
+     int unread = -1;
+     ioctl(c->fd, FIONREAD, &unread);
+     if (unread <= 0)
+     {
+ /* coerce errno to not be EAGAIN */
+ errno = EPIPE;
+ ret = -1;
+     }
+ }
 #else
         ret = select(c->fd + 1, &rfds, &wfds, 0, 0);
 #endif
The above might work in 80% of cases, but won't work if some data is
left to read.
No. If there's data left to read, the ioctl will return a non-negative
integer in 'unread'. Only if there is nothing left to read will we
tranform this into an error condition.

Although, I am missing a break in the inner if block. Tsk tsk.
Post by Uli Schlachter
How about fixing this properly in Xlib instead? If it wants to check
if the error was due to "the other end closed the connection", it can
Fair enough. Do note that the existing code already does consider this
case to be an error, due to the stanza above the bit I added:

ret = poll(&fd, 1, -1);
/* If poll() returns an event we didn't expect, such as POLLNVAL, treat
* it as if it failed. */
if(ret >= 0 && (fd.revents & ~fd.events))
{
ret = -1;
break;
}

Since we're not presently adding POLLHUP to fd.events...

- ajax
Uli Schlachter
2016-10-17 12:34:42 UTC
Permalink
Post by Adam Jackson
Post by Uli Schlachter
How about fixing this properly in Xlib instead? If it wants to check
if the error was due to "the other end closed the connection", it can
Fair enough. Do note that the existing code already does consider this
ret = poll(&fd, 1, -1);
/* If poll() returns an event we didn't expect, such as POLLNVAL, treat
* it as if it failed. */
if(ret >= 0 && (fd.revents & ~fd.events))
{
ret = -1;
break;
}
Since we're not presently adding POLLHUP to fd.events...
Oh, thanks for reminding me. "man poll" tells me that setting "POLLHUP"
in "events" has no effect:

[The field "events"] may be specified as zero, in
which case the only events that can be returned in
revents are POLLHUP, POLLERR, and POLLNVAL (see below).

The field revents is an output parameter, filled by the
kernel with the events that actually occurred. The bits
returned in revents can include any of those specified
in events, or one of the values POLLERR, POLLHUP, or POLLNVAL.
(These three bits are meaningless in the events field, and
will be set in the revents field whenever the corresponding
condition is true.)

Uli
--
“Cold. Cold. Cold. Cold. Cold. Cold. Cold. Cold. Cold. Cold.” – Anna
Loading...