Re: Stuck in a train -- reading HTTP/2 draft. from Willy Tarreau on 2014-06-25 (ietf-http-wg@w3.org from April to June 2014)

From: Willy Tarreau <w@1wt.eu>
Date: Wed, 25 Jun 2014 22:22:19 +0200
To: Roberto Peon <grmocg@gmail.com>
Cc: Poul-Henning Kamp <phk@phk.freebsd.dk>, Martin Thomson <martin.thomson@gmail.com>, Jason Greene <jason.greene@redhat.com>, Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <20140625202219.GW5531@1wt.eu>

On Wed, Jun 25, 2014 at 01:02:14PM -0700, Roberto Peon wrote:
> > Especially if moved to sillicon, because the round-trip to hardware
> > is particularly expensive, which is why some chip makers have moved
> > the crypto accelerators into the CPU's instruction set for example.
> >
> >
> We have examples of this kind of thing in hardware and working to reduce
> cost today:
> TCP Segment Offload (TSO) offloads making TCP segments to the NIC's
> hardware.

TSO works if you feed it with enough data. That's precisely one point
where 16kB makes it totally useless, because the cost of preparing the
fragment descriptors for the NICs overrides the savings of cutting them
into a few packets in the TCP stack.

> If we're talking about non-TLS stuff, then doing this kind of simple thing
> on the NIC doesn't seem that hard. It is roughly the same thing.

Not for an intermediary. I have to receive muxed streams from multiple
servers and deliver them over muxed connections to multiple clients.

Please consider this simple use case :

   - requests for /img /css /js /static go to server 1
   - requests for /video go to server 2
   - requests for other paths go to server 3

Clients send their requests over the same connection. The load balancer
has several connections to servers 1 and 3 behind and forwards clients'
requests over these connections to retrieve objects. In practice, a
client will go first to server 3 (GET /) then to server 1 (retrieve
page components) then to server 2 over a fresh connection and stay
there for a long time. There's no place for NIC-based acceleration
here because this MUX pulls data from one side and transfers it to
the other side in small chunks. When the video starts, if we had the
ability to splice large chunks from server 2 to client, there would
be a real benefit. With the small chunks, the benefits disappear and
we're back to doing the same recv+memcpy()+send job as for the other
servers (double to triple copy instead of zero).

Sure this is not something a common server or user-agent is even able
to detect at regular loads. But you (and you particularly) know like
me that intermediaries need to shave off everything possible to avoid
wasting time doing dumb things such as copying small data or visiting
the same byte twice, etc...

To be clear, this is not the end of the world, this will only probably
lead to a significant part of the internet not deploying what was designed
here, waiting for H3 to appear, when we could easily make it worth for
them to consider the option.

Regards,
Willy

Received on Wednesday, 25 June 2014 20:22:49 UTC