Questions about xcb_writev and socket-handoff

Discussion:

Clemens Eisserer

2008-12-17 12:53:02 UTC

Hi,

1.) I almost finished porting some C-code to pure java with
self-written protocol generation (
http://linuxhippy.blogspot.com/2008/12/almost-pure-java2d.html ).
However sometimes it gets stuck in native code waiting for the
X-Server, the stack trace looks like this:

#0 0xb7f35416 in __kernel_vsyscall ()
#1 0x009885f1 in select () from /lib/libc.so.6
#2 0xa40d9035 in _xcb_conn_wait () from /opt/xorg/lib/libxcb.so.1
#3 0xa40db33a in xcb_wait_for_reply () from /opt/xorg/lib/libxcb.so.1
#4 0xa412a078 in _XReply () from /opt/xorg/lib/libX11.so.6
#5 0xa411e7ea in XSync () from /opt/xorg/lib/libX11.so.6
#6 0xa4322798 in X11SD_GetSharedImage ()
#7 0xa43240fc in X11SD_GetRasInfo ()
#8 0xa43762ac in Java_sun_java2d_loops_Blit_Blit ()
#9 0xb57981c7 in ?? ()

Some parts of the code are still in C, like the image up/downloads -
this is the code in the stack-trace shown below.
Any idea what could be wrong, of how I could debug this problem?

2.) How does xcb_writev works for unix-domain-sockets?
I tried to find information about writev but didn't find in-depth stuff.

- I somewhere read that there is a maximum per-writev size that may
submitted, otherwise glibc will allocate a temporary buffer. How large
is that size?

- Does xcb_writev copy or block or even both? I have a dual-core
system but for now it seems when the x-server is processing commands
the client has to wait and vice-versa - so only one core is fully
loaded. When I used xlib-based code I often saw more than 100% load
when i summed up both the xserver's and client's cpu consumption, but
now 100% seems to be the upper limit.
How is that implemented in xcb, are there any tricks which could help
concurrency?

Thanks again for the handoff-functionality. Its so simple but extremly
flexible - exactly what I was looking for :)

Thank you in advance, Clemens

Jamey Sharp

2009-06-24 17:08:18 UTC

Permalink

Hi Clemens! I was just looking at your Bugzilla report on this:
https://bugs.freedesktop.org/show_bug.cgi?id=22362

I am using the new socket-handoff functionality to mix protocoll
generated by XLib and some self-written protocol.
I am experiencing strange problems, where XCB seems to wait forever on
the server - like the stack-traces illustrate at the end of the email.

To summarize those, xcb_wait_for_reply hangs in select with no responses
forthcoming from the server. In my experience that always means the
client generated incorrect protocol (as you've noticed), or failed to
sync, which is what this looks like.

X-protocol responses have a 16-bit sequence number, but applications
frequently need to remember state about more than 65536 requests at a
time, so XCB widens the sequence number to 64 bits internally. To do
that, it must receive a response at least as often as every 65536
requests. That response could be an event or error--it doesn't have to
be a reply--but of course you can't count on receiving those.

When an app uses XCB's normal interface, rather than socket-handoff,
XCB can insert a sync request automatically as needed. In socket-handoff
clients, XCB can't safely insert requests ever, so the caller is
responsible for that.

The simplest approach you can take here is to issue a GetInputFocus
request immediately after xcb_take_socket, and then do it again every
64k requests that you issue. You *don't* need to wait for the reply
immediately; just make sure you pick up the reply eventually, or there
will be a memory leak. This is not optimal, but it is at least correct
and simple.

1.) I almost finished porting some C-code to pure java with
self-written protocol generation (
http://linuxhippy.blogspot.com/2008/12/almost-pure-java2d.html ).

That's really cool!

2.) How does xcb_writev works for unix-domain-sockets?

Quite well, thank you! :-)

I tried to find information about writev but didn't find in-depth stuff.
- I somewhere read that there is a maximum per-writev size that may
submitted, otherwise glibc will allocate a temporary buffer. How large
is that size?

I think you're talking about this: "POSIX.1-2001 allows an
implementation to place a limit on the number of items that can be
passed in iov." See the NOTES section of writev(2) for more on that; but
it's a limit on number of entries in the iovec array, not total number
of bytes, and really I don't think you'll ever hit that limit.

- Does xcb_writev copy or block or even both?

xcb_writev does not copy your buffer; it passes it directly to
writev(2). It can block if the server is not ready to read from your
socket and the kernel is out of space to buffer your writes. It doesn't
wait for any responses from the server or anything though, just for the
write to be queued at the kernel.

I have a dual-core system but for now it seems when the x-server is
processing commands the client has to wait and vice-versa - so only
one core is fully loaded. When I used xlib-based code I often saw more
than 100% load when i summed up both the xserver's and client's cpu
consumption, but now 100% seems to be the upper limit. How is that
implemented in xcb, are there any tricks which could help concurrency?

XCB *should* only be an improvement on Xlib in terms of concurrency.

I wonder if your output queue is *too* big? If you flush more often, the
X server can get started on the work you're requesting sooner. The only
reason to have an output queue at all is because system calls are
expensive and you want to do fewer of them.

If that doesn't help, something like `strace -T` may tell you what
syscalls are keeping your process waiting.

Thanks again for the handoff-functionality. Its so simple but extremly
flexible - exactly what I was looking for :)

That's good to hear! We sure hoped it would be useful.

Jamey

Clemens Eisserer

2009-07-01 15:52:21 UTC

Permalink

Hi Jamey,

Thanks a lot for looking at that stuff :)

Post by Jamey Sharp
The simplest approach you can take here is to issue a GetInputFocus
request immediately after xcb_take_socket, and then do it again every
64k requests that you issue. You *don't* need to wait for the reply
immediately; just make sure you pick up the reply eventually, or there
will be a memory leak. This is not optimal, but it is at least correct
and simple.

Thanks for the very detailed explanation, and for the howto to work
arround that problem.
Sorry for the dumb question, when I issue a getInputFocus request with
self-generated protocol, how can I consume the reply in native code?
The rest of the application is using the xlib-api, so probably I am
not able to use xcb's api?
Just for a test I modified the java code to send a GetInputFocus
request every 10 requests, and the blocking seems to be gone but as
expected the app now leaks memory.

To have a simple playground I also modified my c-testcase, but now it

Post by Jamey Sharp
handoff: xcb_io.c:242: process_responses: Assertion `(((long) (dpy->last_request_read) - (long) (dpy->request)) <=
0)' failed.
XIO: fatal IO error 11 (Resource temporarily unavailable) on X server "localhost:0.0"
after 130277 requests (65146 known processed) with 1 events remaining.

However using wireshark the protocol I generated at least *seemed*
valid (wireshark was able to decode it ;) ).

Post by Jamey Sharp
I wonder if your output queue is *too* big? If you flush more often, the
X server can get started on the work you're requesting sooner. The only
reason to have an output queue at all is because system calls are
expensive and you want to do fewer of them.
If that doesn't help, something like `strace -T` may tell you what
syscalls are keeping your process waiting.

Thanks again :)

Last question, why does XEventsQueued cause a socket-takeback of the
native side?
The application I use frequently calls XEventQueued, and it seems like
socket-handoff ping-pong is taking place :-/
Any idea howto work arround that issue?

Thanks again, Clemens

Julien Cristau

2009-07-01 16:15:16 UTC

Permalink

int fillRect(Picture dst, char* data) {
unsigned int* req_i = (unsigned int*) data;
unsigned short* req_s = (unsigned short*) data;
unsigned char* req_b = (unsigned char*) data;
/*Compiler's nightmare ;) */
req_b[0] = RENDER;
req_b[1] = 26; //FillRectangles-Opcode
req_s[1] = 9; //Request-length
req_b[4] = 3; //Over
req_i[2] = dst;
req_s[6] = 0xffff; //green
req_s[7] = 0xffff; //Red
req_s[8] = 0; //Blue
req_s[9] = 0xffff; //A
req_s[10] = 100;
req_s[11] = 100;
req_s[12] = 20;
req_s[13] = 20;
req_s[14] = 120;
req_s[15] = 120;
req_s[16] = 10;
req_s[17] = 10;
return 36;
}

Why are you using such unreadable horrible code, instead of just filling
an xRenderFillRectanglesReq and two xRectangle structures?

Cheers,
Julien

Clemens Eisserer

2009-07-01 16:52:09 UTC

Permalink

Post by Julien Cristau
Why are you using such unreadable horrible code, instead of just filling
an xRenderFillRectanglesReq and two xRectangle structures?

Because its simply a testcase, and I haven't found it worthwhile to
learn xcb's internals for such a simple piece of code.
After all it works, and its not intended to be used in production code anyway.

- Clemens

Ian Osgood

2009-07-01 17:35:31 UTC

Permalink

Post by Clemens Eisserer

Post by Julien Cristau
Why are you using such unreadable horrible code, instead of just filling
an xRenderFillRectanglesReq and two xRectangle structures?

To be fair, these are Xlib internals (proto/renderproto/renderproto.h
and proto/x11proto/Xprotostr.h), although there are equivalent XCB
structure definitions.

Honestly, the aforementioned snippet has killed every kitten to which
I have shown it. Please... think of the kittens.

Ian

Ian Osgood

2009-07-01 17:43:43 UTC

Permalink

To be fair, these are Xlib internals (xorg/proto/renderproto/
renderproto.h
and xorg/proto/x11proto/Xprotostr.h)

er, X11 protocol internals, rather. Both Xlib and the Xorg server use
these structure definitions. Using these definitions are preferable
to hacking values into overlaid char/short/int arrays. It wastes our
time to determine whether your arrays really do match up with the
protocol structure definitions.

Ian

Clemens Eisserer

2009-07-01 21:50:29 UTC

Permalink

Hi Ian,

It wastes our time to determine
whether your arrays really do match up with the protocol structure
definitions.

In the whole 6 months I write emails about these issues only one
person (Jamey) was able to really help me.
A few persons have been very friendly and helpful (like Barton), but
the rest only sent me their wisdome and complaints, without any idea
whats really going on. I've become tiered of noise.

Post by Clemens Eisserer
Sorry for the dumb question, when I issue a getInputFocus request with
self-generated protocol, how can I consume the reply in native code?
The rest of the application is using the xlib-api, so probably I am
not able to use xcb's api

- Clemens

Jamey Sharp

2009-07-02 07:11:54 UTC

Permalink

Post by Clemens Eisserer
Thanks a lot for looking at that stuff :)

Sure!

Post by Clemens Eisserer
... when I issue a getInputFocus request with self-generated protocol,
how can I consume the reply in native code?

Same as for any other request you issue this way: call either
xcb_wait_for_reply or xcb_poll_for_reply, with the sequence number of
the request you issued. Just make darn sure that you've passed the
request to xcb_writev before you try to wait for the reply or your app
will probably hang. :-)

Notably, waiting for a reply should be independent of who owns the
socket at the moment (I think; haven't looked at this recently) so it
doesn't matter if Xlib has called xcb_take_socket before you wait for
the reply.

Post by Clemens Eisserer
To have a simple playground I also modified my c-testcase ...
int ret = xcb_take_socket(con, &return_socket, &flags, 0, &sent);
...
for(requests = 0; requests < 500; requests++) {
written += fillRect(picture, &buffer[written]);
written += getInputFocus(&buffer[written]);
}
...
xcb_writev(con, &vect, 1, requests);

Aren't you writing 1000 requests here and telling XCB there were 500? I
think that explains both your failures.

Post by Clemens Eisserer

Post by Jamey Sharp
handoff: xcb_io.c:242: process_responses: Assertion `(((long) (dpy->last_request_read) - (long) (dpy->request)) <=
0)' failed.

On an assertion like this, it might help if you reproduce the problem
with gdb attached, get a stack trace, and print these values:
- dpy->last_request_read
- dpy->request
- dpy->xcb->connection->in.request_read
- dpy->xcb->connection->out.request_written

You might include the other variables in connection->in and
connection->out that are named request* as well, for completeness.

Post by Clemens Eisserer

Post by Jamey Sharp
XIO: fatal IO error 11 (Resource temporarily unavailable) on X server "localhost:0.0"
after 130277 requests (65146 known processed) with 1 events remaining.

Those numbers are suspicious. 130277-65146 is 65131, which looks like
about the moment that Xlib would try to sync. Possibly the first time
the client read from the socket after writing a ton of requests?

Also, errno 11, EAGAIN, probably shouldn't have been fatal. That
probably means we're misinterpreting an error status somewhere in either
XCB or Xlib. :-( Again, it might help to attach gdb and get a stack
trace when _XIOError is called.

Post by Clemens Eisserer
However using wireshark the protocol I generated at least *seemed*
valid (wireshark was able to decode it ;) ).

Always a good sign! :-)

Post by Clemens Eisserer
Last question, why does XEventsQueued cause a socket-takeback of the
native side? The application I use frequently calls XEventQueued, and
it seems like socket-handoff ping-pong is taking place :-/ Any idea
howto work arround that issue?

XEventsQueued needs to cause any responses from the server to be read,
so it calls process_responses. That in turn needs the Display to have
recent sequence numbers, which it can only get reliably by taking
ownership of the socket. At least, I think that's why that works that
way. Frankly I don't remember now and I'm too sleepy to find out.

I'm not sure there's much we can do about that. Of course, you'll get
much better results if you convert all the event handling to use XCB
directly instead of Xlib. :-) That's something several people on this
list can help with, if you feel up for that.

BTW, it is true that the hand-coded protocol bits in your test app are
frightening. :-) It doesn't matter much how one-off test programs are
implemented, but I hope you'll use something more structured for your
real code. :-)

Jamey

Clemens Eisserer

2009-07-02 14:43:06 UTC

Permalink

Hi again,

One last question, hopefully this time really the last one. Sorry for
all the traffic.

I am a bit puzzled by the fact that xcb_poll_for_reply only accepts a
32-bit sequence number, whereas xcb seems to use 64-bit internally.
In xcb_send_request, the 64-bit sequence counter is simply assigned
to a 32-bit return type:
unsigned int request = ++c->out.request;

How should I generate that 32-bit sequence number, to work arround
overflow problems?

Post by Jamey Sharp
xcb_poll_for_reply, with the sequence number of
the request you issued.

Great, exactly what I was looking for :)
I'll simply poll for outstanding replies when I flush.

Post by Jamey Sharp
Aren't you writing 1000 requests here and telling XCB there were 500? I
think that explains both your failures.

Ooops, embarrassing ;)
I already suspected something stupid, but not that stupid...

Post by Jamey Sharp
I'm not sure there's much we can do about that. Of course, you'll get
much better results if you convert all the event handling to use XCB
directly instead of Xlib. :-) That's something several people on this
list can help with, if you feel up for that.

Well, maybe later ;)

Post by Jamey Sharp
BTW, it is true that the hand-coded protocol bits in your test app are
frightening. :-) It doesn't matter much how one-off test programs are
implemented, but I hope you'll use something more structured for your
real code. :-)

Well, the code looks more or less like the sample code, in java there
are no lightweight structs like C has.
With the exception that I use a stack-like memory (putByte(), putShort(), ....).

I owe you a few beers ;)

Thanks for everything, Clemens

Jamey Sharp

2009-07-02 16:48:29 UTC

Permalink

Sorry for all the traffic.

No, that's exactly what the list is for. :-)

I am a bit puzzled by the fact that xcb_poll_for_reply only accepts a
32-bit sequence number, whereas xcb seems to use 64-bit internally.

Historical mistake. :-( If we had it to do over again we'd use 64-bit
sequence numbers everywhere in the API. Probably XCB will get some new
entry points some day that have 64-bit types in them.

Suggestions for API- and ABI-compatible ways to fix those types would be
welcomed. :-)

How should I generate that 32-bit sequence number, to work arround
overflow problems?

Just don't wait for four billion requests to go by before requesting
responses with xcb_poll_for_reply, and everything should be fine. Even
if you do wait that long it might still work. The XCB implementation was
written with an awareness that this is a problem.

Well, the code looks more or less like the sample code, in java there
are no lightweight structs like C has.
With the exception that I use a stack-like memory (putByte(), putShort(), ....).

Ah, something resembling DataOutputStream or ByteBuffer? That's pretty
reasonable.

Jamey

Clemens Eisserer

2009-07-19 11:08:14 UTC

Permalink

Hi Jamey,

The pure-java backend now works as expected.
I've stopped working on that stuff month ago, because I wasn't able to
solve the hang problem and now with just a few modifications it works
like a charme :)

It would be cool if the socket-handoff API could notify the
socket-taker how many void-requests already have issued to be able to
avoid the request after taking the socket.
However I guess the overhead caused by the single request is noise.

Post by Jamey Sharp
Ah, something resembling DataOutputStream or ByteBuffer? That's pretty
reasonable.

Yes, exactly. A java.misc.Unsafe allowing direct access to malloc'ed memory ;)

Thanks for all your help, patience and support, Clemens

Jamey Sharp

2009-07-19 16:08:16 UTC

Permalink

Post by Clemens Eisserer
The pure-java backend now works as expected.
I've stopped working on that stuff month ago, because I wasn't able to
solve the hang problem and now with just a few modifications it works
like a charme :)

Awesome! Where can we find the code and your thoughts on the process?

Post by Clemens Eisserer
It would be cool if the socket-handoff API could notify the
socket-taker how many void-requests already have issued to be able to
avoid the request after taking the socket.
However I guess the overhead caused by the single request is noise.

Yeah, shipping an extra four bytes every once in a while isn't that
big a deal, since you don't have to wait for the reply, but it'd be
nice to not have to do it. Patches welcome :-) though if we change the
signature of xcb_take_socket the distros will probably axe-murder us
at this point.

XCB itself has enough information to put off the sync for a long time,
because every time a response arrives it can defer the sync longer. An
ideal solution would allow socket handoff users to do the same thing.

Also note that socket-handoff users don't have any way to tell XCB
whether the requests they're generating are void requests, so XCB has
to assume they might not produce a response whenever it decides
whether to insert a sync. Ideally, handoff callers that know whether
they're expecting a reply would have a way to tell XCB so, allowing
XCB to share that with other handoff users too. (Xlib can't easily
provide this information, so we can't require it of the caller, but an
API making it an optional optimization would be nice.)

I don't know what a good API to this would look like, frankly. Ideas, anyone?

Jamey