Re: Backwards compatibility from Willy Tarreau on 2012-03-31 (ietf-http-wg@w3.org from January to March 2012)

From: Willy Tarreau <w@1wt.eu>
Date: Sat, 31 Mar 2012 10:14:06 +0200
To: Poul-Henning Kamp <phk@phk.freebsd.dk>
Cc: Roberto Peon <grmocg@gmail.com>, Mark Watson <watsonm@netflix.com>, Mike Belshe <mike@belshe.com>, "William Chan (?????????)" <willchan@chromium.org>, "<ietf-http-wg@w3.org>" <ietf-http-wg@w3.org>
Message-ID: <20120331081406.GP14039@1wt.eu>

On Sat, Mar 31, 2012 at 07:30:12AM +0000, Poul-Henning Kamp wrote:
> In message <20120331071333.GL14039@1wt.eu>, Willy Tarreau writes:
> 
> >> Framing is just the wrong place to spend complexity... it makes far more
> >> sense to spend it on features that improve performance, latency, or
> >> security, at least in my opinion.
> 
> Framing done badly hurts performance.
> 
> What we're looking for is the highest-performance HTTP protocol
> we can imagine.  HTTP scaled from 10Mbit/s to 10Gbit/s so far,
> HTTP/2.0 will have to do at least 1Tbit/s in its lifetime.

That's very true. And we can only assume that RTTs will become an
issue even for the local network and for NIC-to-memory and CPU-to-
memory communications, so streaming must be possible most of the
time to fill each pipe.

> If we imagine the perfectly optimal behaviour from the network
> stack, and perfectly optimal HTTP message from the other end, the
> perfect protocol scenario looks like this:
> 
>         [length of head]
>         [head]
>         [length of body]
>         [body]

I'm still having a problem with this scheme, it is most of the requests
don't have a body, and having a fixed-size length for each request is
an overhead. We might however have a fixed-size length for a known number
of subsquent heads and compressed size for each message. Or we could also
decide that a head cannot be more than 64kB, and encode the length on 2
bytes. I don't know.

> Under utterly perfect circumstanses, just three socket reads will
> get you the head and body into memory chosen, sized, aligned &
> allocated perfectly for the purpose:
> 
>         READ(length header) ->len buffer
>         (allocate workspace)
>         READV(head + next length header) -> (workspace, len buffer)
>         (allocate bodyspace)
>         READ(body) -> bodyspace

No, under perfect situations, a single readv() would give you all the
parts you need with fixed sizes, because each syscall is a big waste
of time. BTW, I'm sure that when running varnish at full throttle, the
CPU time is largely dominated by the system, as I already observe in
haproxy where the common figure is around 85% system and 15% user. That
said, we cannot obviously have fixed sizes for everything.

> Any protocol which by design requires more work to move the bits from
> the TCP connection, through the socket API and into the applications
> memory, is not a high-performance protocol, worthy of HTTP/2.0
> consideration.

I agree that we must avoid memory copies as much as possible, but a read()
is a system-assisted memory copy. Depending on implementations, performing
one large read + one small copy will be preferable, and other implementations
will prefer several small reads. The protocol must cleanly address these
various needs.

> Notice that just doing:
>         READ(1GB)
> might get you the same data into memory, but you can not optimally
> place it in memory without memcpy'ing it around.

Hopefully we'll not see a 1GB header soon !

> Notice that at this point we have not talked about compression,
> TLS, integrity or anything else, we are only talking about how we
> pull a byte stream out of the TCP socket API, into memory of our
> desire.

Which is the point in favor of making it easy to consume just the data
we need.

Willy

Received on Saturday, 31 March 2012 08:14:40 UTC