On the abuse of chunking for interactive usages

Hi,

a few days ago, a user of haproxy explained to me that he was experiencing
extremely long delays between two servers when communicating through haproxy
while the two servers in direct had no issue.

He was kind enough to provide logs and network traces with a bit of explanation
on the data exchanges.

What is happening is that one server connects to the other one using a PUT
request, makes use of chunked encoding to send data and the second one sends
a chunked encoded response in turn. The protocol is specified here :

   http://php-java-bridge.sourceforge.net/doc/index.php

The issue comes from the fact that the protocol assumes that messages are
fully interactive and that whatever chunk is emitted on one side is
immediately received on the other side and conversely. So each chunk
serves as an ACK for the other one, making the workload consist in
many very small TCP segments (around 10-20 bytes of payload each).

Obviously this can only work on local networks with extremely low RTT.

The issue comes when a gateway is inserted in the middle. Haproxy was built
on the assumption that messages live their own life and that there is no
direct relation between a chunk on one side and a chunk on the other side.
And in order not to flood the receiver with TCP PUSH packets, it aggregates
as many data as possible in each segment (MSG_MORE, equivalent on TCP_CORK)
until the last chunk is seen.

What was causing the massive slowdown is that for each 10-bytes payload seen,
haproxy was telling the system "hey, please hold on for a while, something
else is coming soon". The system waits for 200ms, and seeing nothing else
come, finally sends the chunk. The same happens on the other direction,
resulting in only one req/resp being exchanged every 400 ms.

The workaround, which the user confirmed fixed the issue for him, consists
in sending all chunks as fast as possible (TCP_NODELAY). But doing this
by default makes very inefficient use of mobile networks for normal uses,
especially with compressed traffic which generally is chunked. The issue
is now that each chunk will be sent with the TCP PUSH flag which the client
has to immediately ACK, resulting in a massive slowdown due to uplink
congestion during downloads.

I can also improve the workaround so that haproxy asks the system to wait
only when there are incomplete chunks left, but still this will not cover
the mobile case in a satisfying way. So I'm now tempted to add an option
to let the user decide whether he makes (ab)use of chunking or not.

My concern comes from this specific use of chunking. I see no reason why
this would be valid. I know it will not work at many places. Some proxies
(such as nginx IIRC) buffer the complete request before passing it on. And
in fact many other ones might want to analyse the beginning of the data
before deciding to let it pass through. Also I don't see why we should
accept to turn each chunk into a TCP segment of its own, this seems
contrary to the principle of streamed messages.

My understanding has always been that the only feature that an intermediary
could guarantee is that one all the request body has been transferred, it
will let all the response body pass.

Am I wrong somewhere ? Shouldn't we try to remind implementers that there
is no guarantee of any type of interactivity between two opposite streams
being transferred over the same connection ? I'm worried by the deviations
from the original use. In fact the project above seems to have tried to
implement websocket before it was available. But the fact that some people
do this probably means the spec makes think this is something that can be
expected to work.

Any insights are much appreciated. I've not yet committed on a fix, and I'm
willing to consider opinions here to find the fairest solution for this
type of usage without unduly impacting normal users.

Thanks,
Willy

Received on Tuesday, 10 May 2011 21:26:40 UTC