[Xcb] Is there any way to intercept or generate protocol to an in-memory buffer?

Discussion:

Clemens Eisserer

2017-12-26 20:11:36 UTC

Hi there,

I am currently trying to solve a performance problem in Java2D's
xrender backend.
What I actually would like to do is to defer the actual command
submission to the x-server in order to batch many small XPutImages to
one large XShmPutImage.
I could do this in my own code (log API calls to an application
managed buffer and call the coresponding Xlib/xcb functions later from
within a large protocol-decoding switch statement). However I wonder
if there isn't a more efficient way, given the fact that X's network
transparency introduces some kind of defering anyway.

Is there any way to intercept the protocol before it is sent to the X-Server?
Or in case interception is not supported, is there a way to let xcb
generate all protocol to a user-accessible / user-controlled buffer
and submit it manually to the x-server?

Thank you in advance and best regards, Clemens

Bart Massey

2017-12-26 21:42:20 UTC

Permalink

The overhead of doing separate XShmPutImages should be minimal. If you're
seeing performance issues from that, you might be using the XCB Image
library: you probably should look there for optimizations: I know that
little care went into trying to make it fast back when it was written.

Post by Clemens Eisserer
Hi there,
I am currently trying to solve a performance problem in Java2D's
xrender backend.
What I actually would like to do is to defer the actual command
submission to the x-server in order to batch many small XPutImages to
one large XShmPutImage.
I could do this in my own code (log API calls to an application
managed buffer and call the coresponding Xlib/xcb functions later from
within a large protocol-decoding switch statement). However I wonder
if there isn't a more efficient way, given the fact that X's network
transparency introduces some kind of defering anyway.
Is there any way to intercept the protocol before it is sent to the X-Server?
Or in case interception is not supported, is there a way to let xcb
generate all protocol to a user-accessible / user-controlled buffer
and submit it manually to the x-server?
Thank you in advance and best regards, Clemens
_______________________________________________
Xcb mailing list
https://lists.freedesktop.org/mailman/listinfo/xcb

Clemens Eisserer

2017-12-28 19:38:56 UTC

Permalink

Hi Bart,

Post by Bart Massey
The overhead of doing separate XShmPutImages should be minimal.

Xcb is performing fine, what is causing performance issues is how the
drivers implement those tiny XPutImage requests (especially when
operating on top of OpenGL like all glamor based setups).
I've found a way to implement the deferred protocol submission using
the socket handoff mechanism and it seems to work really well.

However, currently the socket owner changes quite frequently, so I am
struggling with the requirement to issue a XGetInputFocus-Request
*every* time I call xcb_take_socket.
For cases where there are only 4 requests to submit, an additional
XGetInputFocus hurts.

What I tried instead was:

uint64_t lastManualSyncSeq; //Remember sequence-number, when the last
manual sync was issued
uint64_t socketTakenSeq; //Sequence number when the socket was taken
uint32_t self_generated_count; //Number of self-generated X requests

while(1) {
xcb_take_socket(xcbCon, &returnSocketCB, &flags, 0, &socketTakenSeq);

while(1) {
if((socketTakenSeq + self_generated_count) - lastManualSyncSeq > 65000) {
//will cause socket to be revoked, callback will take care of
flushing out generated protocol
lastManualSyncSeq = xcb_get_input_focus(xcbCon).sequence;
xcb_discard_reply(xcbCon, lastManualSyncSeq);
self_generated_count = 0;
break;
}

self_generated_count++;
GenerateXrenderFillRectanglesRequest(); //appends reply-less
request to request data buffer
}

The idea is I can't be sure when xcb automatically generated the last
XGetInputFocus request, but I know when I last called
xcb_get_input_focus myself.
As long as the last call to xcb_get_input_focus is less than 65k
requests away, everything is fine (regardless of whether xcb issued
some xcb_get_input_focus requests in between).

However, this doesn't seem to work out as expected - I still get hangs
from time to time.

Any idea what is wrong with the above snippit?

Thank you in advance, Clemens

Bart Massey

2017-12-29 09:43:21 UTC

Permalink

I don't remember the details of how we did the sync in XCB back when: we
did something very like what you are doing, but the details are a bit
tricky to get right. In particular, XCB will only send an input focus
request when no request (not just input focus) has received a reply in the
last 64K requests. So...yeah. It's been literally a decade since I looked
at this stuff, but I *think* you should be able to dig the last-sync count
out of XCB and just work from that: probably can just use the XCB code or
at least algorithm to tell you when to sync. We had a correctness proof at
one point. We had it again after we found and fixed a bug in the proof. :-)

Post by Clemens Eisserer
Hi Bart,

Post by Bart Massey
The overhead of doing separate XShmPutImages should be minimal.

Xcb is performing fine, what is causing performance issues is how the
drivers implement those tiny XPutImage requests (especially when
operating on top of OpenGL like all glamor based setups).
I've found a way to implement the deferred protocol submission using
the socket handoff mechanism and it seems to work really well.
However, currently the socket owner changes quite frequently, so I am
struggling with the requirement to issue a XGetInputFocus-Request
*every* time I call xcb_take_socket.
For cases where there are only 4 requests to submit, an additional
XGetInputFocus hurts.
uint64_t lastManualSyncSeq; //Remember sequence-number, when the last
manual sync was issued
uint64_t socketTakenSeq; //Sequence number when the socket was taken
uint32_t self_generated_count; //Number of self-generated X requests
while(1) {
xcb_take_socket(xcbCon, &returnSocketCB, &flags, 0, &socketTakenSeq);
while(1) {
if((socketTakenSeq + self_generated_count) - lastManualSyncSeq > 65000) {
//will cause socket to be revoked, callback will take care of
flushing out generated protocol
lastManualSyncSeq = xcb_get_input_focus(xcbCon).sequence;
xcb_discard_reply(xcbCon, lastManualSyncSeq);
self_generated_count = 0;
break;
}
self_generated_count++;
GenerateXrenderFillRectanglesRequest(); //appends reply-less
request to request data buffer
}
The idea is I can't be sure when xcb automatically generated the last
XGetInputFocus request, but I know when I last called
xcb_get_input_focus myself.
As long as the last call to xcb_get_input_focus is less than 65k
requests away, everything is fine (regardless of whether xcb issued
some xcb_get_input_focus requests in between).
However, this doesn't seem to work out as expected - I still get hangs
from time to time.
Any idea what is wrong with the above snippit?
Thank you in advance, Clemens

Clemens Eisserer

2017-12-30 14:40:02 UTC

Permalink

Hi Bart,

I don't remember the details of how we did the sync in XCB back when: we did
something very like what you are doing, but the details are a bit tricky to
get right. In particular, XCB will only send an input focus request when no
request (not just input focus) has received a reply in the last 64K
requests. So...yeah. It's been literally a decade since I looked at this
stuff, but I *think* you should be able to dig the last-sync count out of
XCB and just work from that: probably can just use the XCB code or at least
algorithm to tell you when to sync.

Thanks for taking a look at the approach - actually it was just a
simple bug on my side ;)
Overall things look good now - I am able to reduce the number of
XPutImage requests depending on the workload by about 4-20x, with no
regressions running code which is not the target of the optimization.

So the tinkering I did with socket handoff 8-9 years ago now turns
into something useful after all :)

Thanks and best regards, Clemens