- From: Willy Tarreau <w@1wt.eu>
- Date: Tue, 10 May 2011 23:26:15 +0200
- To: HTTP Working Group <ietf-http-wg@w3.org>
Hi, a few days ago, a user of haproxy explained to me that he was experiencing extremely long delays between two servers when communicating through haproxy while the two servers in direct had no issue. He was kind enough to provide logs and network traces with a bit of explanation on the data exchanges. What is happening is that one server connects to the other one using a PUT request, makes use of chunked encoding to send data and the second one sends a chunked encoded response in turn. The protocol is specified here : http://php-java-bridge.sourceforge.net/doc/index.php The issue comes from the fact that the protocol assumes that messages are fully interactive and that whatever chunk is emitted on one side is immediately received on the other side and conversely. So each chunk serves as an ACK for the other one, making the workload consist in many very small TCP segments (around 10-20 bytes of payload each). Obviously this can only work on local networks with extremely low RTT. The issue comes when a gateway is inserted in the middle. Haproxy was built on the assumption that messages live their own life and that there is no direct relation between a chunk on one side and a chunk on the other side. And in order not to flood the receiver with TCP PUSH packets, it aggregates as many data as possible in each segment (MSG_MORE, equivalent on TCP_CORK) until the last chunk is seen. What was causing the massive slowdown is that for each 10-bytes payload seen, haproxy was telling the system "hey, please hold on for a while, something else is coming soon". The system waits for 200ms, and seeing nothing else come, finally sends the chunk. The same happens on the other direction, resulting in only one req/resp being exchanged every 400 ms. The workaround, which the user confirmed fixed the issue for him, consists in sending all chunks as fast as possible (TCP_NODELAY). But doing this by default makes very inefficient use of mobile networks for normal uses, especially with compressed traffic which generally is chunked. The issue is now that each chunk will be sent with the TCP PUSH flag which the client has to immediately ACK, resulting in a massive slowdown due to uplink congestion during downloads. I can also improve the workaround so that haproxy asks the system to wait only when there are incomplete chunks left, but still this will not cover the mobile case in a satisfying way. So I'm now tempted to add an option to let the user decide whether he makes (ab)use of chunking or not. My concern comes from this specific use of chunking. I see no reason why this would be valid. I know it will not work at many places. Some proxies (such as nginx IIRC) buffer the complete request before passing it on. And in fact many other ones might want to analyse the beginning of the data before deciding to let it pass through. Also I don't see why we should accept to turn each chunk into a TCP segment of its own, this seems contrary to the principle of streamed messages. My understanding has always been that the only feature that an intermediary could guarantee is that one all the request body has been transferred, it will let all the response body pass. Am I wrong somewhere ? Shouldn't we try to remind implementers that there is no guarantee of any type of interactivity between two opposite streams being transferred over the same connection ? I'm worried by the deviations from the original use. In fact the project above seems to have tried to implement websocket before it was available. But the fact that some people do this probably means the spec makes think this is something that can be expected to work. Any insights are much appreciated. I've not yet committed on a fix, and I'm willing to consider opinions here to find the fairest solution for this type of usage without unduly impacting normal users. Thanks, Willy
Received on Tuesday, 10 May 2011 21:26:40 UTC