Re: SPDY and the HTTP Binding from Roberto Peon on 2012-10-12 (ietf-http-wg@w3.org from October to December 2012)

From: Roberto Peon <grmocg@gmail.com>
Date: Fri, 12 Oct 2012 16:37:33 -0700
To: Willy Tarreau <w@1wt.eu>
Cc: Poul-Henning Kamp <phk@phk.freebsd.dk>, James M Snell <jasnell@gmail.com>, ietf-http-wg@w3.org
Message-ID: <CAP+FsNdN3E0G+vQ5P7BW6RfnqJ_xweOXN3GzunPVLzG26vKzxA@mail.gmail.com>
On Fri, Oct 12, 2012 at 4:11 PM, Willy Tarreau <w@1wt.eu> wrote:

> On Fri, Oct 12, 2012 at 03:59:55PM -0700, Roberto Peon wrote:
> > On Fri, Oct 12, 2012 at 3:53 PM, Willy Tarreau <w@1wt.eu> wrote:
> >
> > > Hi Roberto,
> > >
> > > On Fri, Oct 12, 2012 at 02:49:20PM -0700, Roberto Peon wrote:
> > > > The most recent output, copy/pasted is:
> > > >
> > > > "Delta-coding took: 0.642199 seconds for: 104300 header frames or
> > > > 6.15723e-06 per header or 162411 headers/sec or 8.88429e+07
> bytes/sec"
> > > >
> > > > So, ~89 million bytes/second and 162k requests/second for the
> > > delta-coding
> > > > on one core.
> > >
> > > It does not seem bad, and I also know that it's hard to compare
> numbers.
> > > I like to count in terms of bytes per second or headers per second, but
> > > obviously it depends on the coding scheme.
> > >
> > > Right now I made a test on haproxy using a request to pinterest.comthat
> > > I captured from Firefox 13 (didn't know the site so I tried it). The
> > > request looks like this, it's 282 bytes long and has 7 header fields :
> > >
> > >   GET / HTTP/1.1
> > >   Host: pinterest.com
> > >   User-Agent: Mozilla/5.0 (X11; Linux i686; rv:13.0) Gecko/20100101
> > > Firefox/13.0.1
> > >   Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> > >   Accept-Language: en-us,en;q=0.5
> > >   Accept-Encoding: gzip, deflate
> > >   Connection: keep-alive
> > >
> > > It was running on a single core of a core i7 2600 @3.4 GHz. I sent it 4
> > > million times to haproxy which sent a redirect on them, and it took
> 3.609
> > > seconds for the 4 million reqs, which is 1.1 million req/s, which is
> also
> > > exactly the same number as it reports in the stats, and 312 MB/s. So at
> > > first glance it's 3.5-7 times faster on a single core than the
> compressor
> > > alone. So this would mean that it would spend 88% of its CPU time in
> the
> > > compressor alone, and the 12% remaining doing its job.
> > >
> >
> > No, that means that a loadtest client would spend 88% of its time doing
> > compression. :)
>
> :-)
>
> > Decompresion is faster, and 'recompression' is also faster, as you can
> > resuse much of the client's compression state, at least theoretically.
>
> Well, responses still need to be sent to clients, so compression time
> matters too. That's where I think that tokenizing must help, as it
> saves the CPU from having to deal with protocols made for humans.
>

There are a couple of ways that could be fast (I don't have numbers yet).
use ERefs (ephemereal references). These don't change the compressor state,
and always require the full key-value text is sent.

A better approach would be to have the common parts of such things use
stuff that is known to exist in the LRU (e.g. the pre-defined items, which
include almost every key that I've seen, and also included many common
values). If you do redirects for a particular field enough, you could
simply ensure that you maintain a pointer to the LRU items that informs you
if it ever expires from the LRU, in which case you'd have to recreate the
sate, but otherwise you'd not even have to do a lookup... and in that case
it will be as fast as a branch and memcopy.



>
> > > I understand the code is not optimized yet, but this typically is the
> type
> > > of thing I want us to be extremely careful about, because it's very
> easy
> > > to completely kill performance for the last percent of optimization
> over
> > > the wire. In fact I'm not that much worried for the 1.1-to-2.0
> conversion
> > > because as time goes, the need for this work will fade away and won't
> > > represent most of the CPU usage. But routing and processing 2.0 to 2.0
> > > should be optimally fast.
> > >
> >
> > I know that we're coming at this from different motiviations as well-- I
> > suspect that most sites that are handling millions of requests/second are
> > perfectly happy to spend the money to get another HAProxy box in exchange
> > for lower client latency, if that is the tradeoff they can get, since
> much
> > of the time lower latency translates into higher conversion rates and
> thus
> > more profit.
>
> What is important in my opinion is to keep the CPU overhead low. I would
> have
> loved to see a CPU improvement instead, but we're not there yet :-)
>

It could happen-- if the size of the requests become substantially smaller
(especially if/when people understand how to optimize for a delta-encoded
protocol), the time spent doing interpretation or doing IO also drops.
My experience running Google's edge tells me that this could be a
significant cost savings, but only time will tell if it completely offsets
the cost of the compression, etc.

-=R
Received on Friday, 12 October 2012 23:38:02 UTC