- From: Willy Tarreau <w@1wt.eu>
- Date: Tue, 24 Jun 2014 08:28:42 +0200
- To: Mark Nottingham <mnot@mnot.net>
- Cc: Poul-Henning Kamp <phk@phk.freebsd.dk>, K.Morgan@iaea.org, matthew@kerwin.net.au, squid3@treenet.co.nz, ietf-http-wg@w3.org
Hi Mark, On Tue, Jun 24, 2014 at 04:09:46PM +1000, Mark Nottingham wrote: > Hi PHK, > > On 23 Jun 2014, at 8:05 pm, Poul-Henning Kamp <phk@phk.freebsd.dk> wrote: > > > In message <0356EBBE092D394F9291DA01E8D28EC201186DF063@sem002pd.sg.iaea.org>, K > > .Morgan@iaea.org writes: > >> On Sunday,22 June 2014 14:36, phk@phk.freebsd.dk wrote: > > > >>>> I realise I should probably clarify my thoughts on what to do if a > >>>> single header doesn't fit in a 16K frame. The option I like best comes > >>>> from one of PHK's earlier posts, where one of the reserved bits in the > >>>> frame header is used as a "jumbo frame" marker such that if it's set > >>>> the first, say, four octets of payload space is actually an extra 32 > >>>> bits of payload length > >>> > >>> I would have it be the max length of *any* frame we're willing to accept, > >>> and the default would then obviously be the 16kbyte currently implicit in > >> the standard. > >> > >> So are you proposing the "jumbo frame" marker for all frames, not just the > >> HEADERS frames? I think it's a great idea, but I know it makes a bunch of > >> people nervous about HOL blocking if you allow more than 16K in a DATA frame. > > > > Yes, the length-extension would be available on all frames, which is why > > we need a SETTING to limit what we'll accept in that respect. > > > > For huge file transfers the 16k frames are horribly suboptimal and > > having the receiver bang the frame size up once "Content-Length: A_LOT" > > has been received will do wonders for performance on both ends. > > > > Obviously, you can also reduce the frame size you'll accept. 16K > > is quite large for a number of high traffic sites prone to DoS. > > This has been discussed a lot over the life of the WG. The place where we left it was that the overhead of framing was quite small, considering that it's 8 bytes over 16K; TCP overheads are usually going to be bigger. > > It's true that you can't use sendfile() here, but that's true with multiplexing regardless. It was felt that over time, kernel facilities specific to the use case of HTTP/2 will emerge if necessary, just as they did for HTTP/1. > > Is there something else behind "horribly suboptimal" here? Can you give some numbers? I know some high traffic sites running with haproxy, above 100 Gbps. Such sites don't ever make use of concurrent streams, because as PHK calls them, they're delivering pink pixels over the net. At these rates, the problem is not TCP overhead or any such thing, but the processing cost. A single haproxy node can forward data at 40 Gbps with 256kB buffers, 35 Gbps with 64kB buffers and something around 20 Gbps with 16kB buffers. At 16 kB buffers and 20 Gbps, that's 150000 recv/send per second. That's extremely inefficient CPU cache- wise and requires a lot of context switching. I'd say in fact that the task processing overhead becomes huge compared to the small cost of copying or even just splicing data between two ends. Memory bandwidth is huge with todays processors, and were seeing the 100 Gbps NICs coming, so large data blocks are processed with a low cost. NICs are capable of doing large receive offloading and TCP segmentation offloading, so it's possible for the TCP stack to process 64kB packets which are much cheaper for the stack than 16kB packets. So that's really a problem of data processing overhead on top of cheap forwarding. I would find it sad that the sites responsible for something like 75% of the internet's traffic refrain from upgrading because of the extra infrastructure costs :-/ Regards, Willy
Received on Tuesday, 24 June 2014 06:33:15 UTC