- From: Mark Nottingham <mnot@mnot.net>
- Date: Wed, 11 May 2011 15:28:03 +1000
- To: Willy Tarreau <w@1wt.eu>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
I think you're generally right. See also: http://trac.tools.ietf.org/wg/httpbis/trac/ticket/283 Suggestions for text welcome. On 11/05/2011, at 7:26 AM, Willy Tarreau wrote: > Hi, > > a few days ago, a user of haproxy explained to me that he was experiencing > extremely long delays between two servers when communicating through haproxy > while the two servers in direct had no issue. > > He was kind enough to provide logs and network traces with a bit of explanation > on the data exchanges. > > What is happening is that one server connects to the other one using a PUT > request, makes use of chunked encoding to send data and the second one sends > a chunked encoded response in turn. The protocol is specified here : > > http://php-java-bridge.sourceforge.net/doc/index.php > > The issue comes from the fact that the protocol assumes that messages are > fully interactive and that whatever chunk is emitted on one side is > immediately received on the other side and conversely. So each chunk > serves as an ACK for the other one, making the workload consist in > many very small TCP segments (around 10-20 bytes of payload each). > > Obviously this can only work on local networks with extremely low RTT. > > The issue comes when a gateway is inserted in the middle. Haproxy was built > on the assumption that messages live their own life and that there is no > direct relation between a chunk on one side and a chunk on the other side. > And in order not to flood the receiver with TCP PUSH packets, it aggregates > as many data as possible in each segment (MSG_MORE, equivalent on TCP_CORK) > until the last chunk is seen. > > What was causing the massive slowdown is that for each 10-bytes payload seen, > haproxy was telling the system "hey, please hold on for a while, something > else is coming soon". The system waits for 200ms, and seeing nothing else > come, finally sends the chunk. The same happens on the other direction, > resulting in only one req/resp being exchanged every 400 ms. > > The workaround, which the user confirmed fixed the issue for him, consists > in sending all chunks as fast as possible (TCP_NODELAY). But doing this > by default makes very inefficient use of mobile networks for normal uses, > especially with compressed traffic which generally is chunked. The issue > is now that each chunk will be sent with the TCP PUSH flag which the client > has to immediately ACK, resulting in a massive slowdown due to uplink > congestion during downloads. > > I can also improve the workaround so that haproxy asks the system to wait > only when there are incomplete chunks left, but still this will not cover > the mobile case in a satisfying way. So I'm now tempted to add an option > to let the user decide whether he makes (ab)use of chunking or not. > > My concern comes from this specific use of chunking. I see no reason why > this would be valid. I know it will not work at many places. Some proxies > (such as nginx IIRC) buffer the complete request before passing it on. And > in fact many other ones might want to analyse the beginning of the data > before deciding to let it pass through. Also I don't see why we should > accept to turn each chunk into a TCP segment of its own, this seems > contrary to the principle of streamed messages. > > My understanding has always been that the only feature that an intermediary > could guarantee is that one all the request body has been transferred, it > will let all the response body pass. > > Am I wrong somewhere ? Shouldn't we try to remind implementers that there > is no guarantee of any type of interactivity between two opposite streams > being transferred over the same connection ? I'm worried by the deviations > from the original use. In fact the project above seems to have tried to > implement websocket before it was available. But the fact that some people > do this probably means the spec makes think this is something that can be > expected to work. > > Any insights are much appreciated. I've not yet committed on a fix, and I'm > willing to consider opinions here to find the fairest solution for this > type of usage without unduly impacting normal users. > > Thanks, > Willy > > -- Mark Nottingham http://www.mnot.net/
Received on Wednesday, 11 May 2011 05:28:34 UTC