- From: Willy Tarreau <w@1wt.eu>
- Date: Wed, 25 Jun 2014 18:12:18 +0200
- To: Johnny Graettinger <jgraettinger@chromium.org>
- Cc: Patrick McManus <pmcmanus@mozilla.com>, Mark Nottingham <mnot@mnot.net>, K.Morgan@iaea.org, Poul-Henning Kamp <phk@phk.freebsd.dk>, Greg Wilkins <gregw@intalio.com>, HTTP Working Group <ietf-http-wg@w3.org>, Martin Dürst <duerst@it.aoyama.ac.jp>
On Wed, Jun 25, 2014 at 11:48:08AM -0400, Johnny Graettinger wrote: > FWIW, I don't buy the premise that the current framing mechanism requires > more frequent system calls, or must imply "lower performance" for large > sends. One frame != one system call to write or read that frame. > > Vectored IO API's are available to write multiple frames with a single > call. ... and they result in a data copy which is often even more expensive than the syscall it tried to save. When you're forwarding data between two TCP sockets and can make use of TCP splicing, you have to play with pages (4kB). Anything not a full page will result in a copy. And recv()+send() will result in two copies. That's why splice() on small sizes (typically 16kB) offers no benefit : the first and/or the last page are often incomplete or unaligned, resulting in only the two middle ones being spliced from the CPU's L3 cache without ever hitting memory. Also, with a 14-bit encoding, we cannot transfer 16 kB, we can at most transfer 16kB-1, so you're always guaranteed to *copy* 4095 bytes at the end of each transfer and to misalign every data block. 16kB are *really* suboptimal for large transfers. I just got a report of a company reaching 58 Gbps of forwarded traffic with haproxy, using a splice size of 512kB. There *are* definitely a lot of losses to expect from running at 16kB-1. I'd give 10-15 Gbps for the setup above, not more. I'm not advocating for making things complex nor for breaking the protocol either, I'm just reporting real world scenarios which work very well with 1.1 and which are 2.0-unfriendly. It's not too late to try to fix these corner cases. I think that the ability to use just one bit to change the size unit is reasonable. It's exactly what TCP uses with Window Scaling and it works pretty well. We can easily accept that the unit only changes after the first round trip if needed, what matters is that not all the stream is retrieved in tiny 16kB chunks. Regards, Willy
Received on Wednesday, 25 June 2014 16:13:42 UTC