- From: Roberto Peon <grmocg@gmail.com>
- Date: Wed, 25 Jun 2014 13:02:14 -0700
- To: Willy Tarreau <w@1wt.eu>
- Cc: Poul-Henning Kamp <phk@phk.freebsd.dk>, Martin Thomson <martin.thomson@gmail.com>, Jason Greene <jason.greene@redhat.com>, Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>
- Message-ID: <CAP+FsNf9cOaTWJ5WxHDy8+4MM+LVcvDd8MYZV+ZjjYVwc9SSeA@mail.gmail.com>
On Wed, Jun 25, 2014 at 12:51 PM, Willy Tarreau <w@1wt.eu> wrote: > On Wed, Jun 25, 2014 at 07:30:02PM +0000, Poul-Henning Kamp wrote: > > In message < > CABkgnnVZb7e9npjm0P+fT7VeCCv+2TKuo4djDRviA1wF8YD0OQ@mail.gmail.com> > > , Martin Thomson writes: > > > > >I know of at least one major operating system that supports this sort > > >of function already. And let's be clear: HTTP is important enough to > > >allocate custom kernel resources to improve performance. I'd argue > > >that it's important enough to dedicate silicon to eke out a few > > >milliseconds or watts. > > > > While that is true, even in kernel code 16kB framesize is suboptimal > > from a performance point of view when the majority of all objects > > are larger than that. > > 16kB-1 please, or 3.99975589375 pages, thus 3 pages hence 12kB or 8 MSS > in practice for most usages :-( > > > At 100 Gbit/s, you'll be north of half a million frames per second, > > statistically probably very close to full million frames per second. > > Yes and at 3 GHz, that's 3000 cycles per frame, which are easily > wasted doing a plain data copy of 16383 bytes, some cache misses + > a little bit of synchronization job. I'd rather have 30000 cycles > to forward 10 times this and avoiding the copy. > > True IFF not using TLS, at which point we're doing more copies, etc. Honestly, spending money/effort getting those libraries correct/optimized would have a substantially larger impact than worrying about the framesize in almost all web cases for HTTP2. > > Being able to cut that number by a factor of 10 will matter a lot to > > performance -- even if you allocate silicon. > > Especially if moved to sillicon, because the round-trip to hardware > is particularly expensive, which is why some chip makers have moved > the crypto accelerators into the CPU's instruction set for example. > > We have examples of this kind of thing in hardware and working to reduce cost today: TCP Segment Offload (TSO) offloads making TCP segments to the NIC's hardware. If we're talking about non-TLS stuff, then doing this kind of simple thing on the NIC doesn't seem that hard. It is roughly the same thing. -=R > > 100 Gbit/s NICs are close to shipping in bulk and some people are > > already talking about 400 Gbit/s ethernet as the next step. > > I'm impatient :-) > > Willy > > >
Received on Wednesday, 25 June 2014 20:02:41 UTC