Re: SPDY and the HTTP Binding from Roberto Peon on 2012-10-12 (ietf-http-wg@w3.org from October to December 2012)

From: Roberto Peon <grmocg@gmail.com>
Date: Fri, 12 Oct 2012 15:59:55 -0700
To: Willy Tarreau <w@1wt.eu>
Cc: Poul-Henning Kamp <phk@phk.freebsd.dk>, James M Snell <jasnell@gmail.com>, ietf-http-wg@w3.org
Message-ID: <CAP+FsNfAg6Qno9FGAYKPErYpFsMP6pV_q2eNAj-8XdHkvzjeNA@mail.gmail.com>

On Fri, Oct 12, 2012 at 3:53 PM, Willy Tarreau <w@1wt.eu> wrote:

> Hi Roberto,
>
> On Fri, Oct 12, 2012 at 02:49:20PM -0700, Roberto Peon wrote:
> > The most recent output, copy/pasted is:
> >
> > "Delta-coding took: 0.642199 seconds for: 104300 header frames or
> > 6.15723e-06 per header or 162411 headers/sec or 8.88429e+07 bytes/sec"
> >
> > So, ~89 million bytes/second and 162k requests/second for the
> delta-coding
> > on one core.
>
> It does not seem bad, and I also know that it's hard to compare numbers.
> I like to count in terms of bytes per second or headers per second, but
> obviously it depends on the coding scheme.
>
> Right now I made a test on haproxy using a request to pinterest.com that
> I captured from Firefox 13 (didn't know the site so I tried it). The
> request looks like this, it's 282 bytes long and has 7 header fields :
>
>   GET / HTTP/1.1
>   Host: pinterest.com
>   User-Agent: Mozilla/5.0 (X11; Linux i686; rv:13.0) Gecko/20100101
> Firefox/13.0.1
>   Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
>   Accept-Language: en-us,en;q=0.5
>   Accept-Encoding: gzip, deflate
>   Connection: keep-alive
>
> It was running on a single core of a core i7 2600 @3.4 GHz. I sent it 4
> million times to haproxy which sent a redirect on them, and it took 3.609
> seconds for the 4 million reqs, which is 1.1 million req/s, which is also
> exactly the same number as it reports in the stats, and 312 MB/s. So at
> first glance it's 3.5-7 times faster on a single core than the compressor
> alone. So this would mean that it would spend 88% of its CPU time in the
> compressor alone, and the 12% remaining doing its job.
>

No, that means that a loadtest client would spend 88% of its time doing
compression. :)
Decompresion is faster, and 'recompression' is also faster, as you can
resuse much of the client's compression state, at least theoretically.



> I understand the code is not optimized yet, but this typically is the type
> of thing I want us to be extremely careful about, because it's very easy
> to completely kill performance for the last percent of optimization over
> the wire. In fact I'm not that much worried for the 1.1-to-2.0 conversion
> because as time goes, the need for this work will fade away and won't
> represent most of the CPU usage. But routing and processing 2.0 to 2.0
> should be optimally fast.
>

I know that we're coming at this from different motiviations as well-- I
suspect that most sites that are handling millions of requests/second are
perfectly happy to spend the money to get another HAProxy box in exchange
for lower client latency, if that is the tradeoff they can get, since much
of the time lower latency translates into higher conversion rates and thus
more profit.


>
> > Given that most machines have more than once core, this
> > performance seems pretty reasonable to me, especially given that it is
> not
> > really been optimized. I'd hope that bandwidth was predominantly
> > entity-bodies and not headers, and so the bandwidth doesn't seem
> > problematic to me either.
>
> Unfortunately my experience has often been that most of the requests
> are if-modified-since and most of the responses are 304 on a number of
> web sites, so we still need to be careful.
>

Thanks for sharing your data BTW !
>

Yup. I hope that we'll come to a useful compromise where we can get
consensus on the latency/CPU tradeoff, and vastly improve over what we've
had with SPDY thusfar.
I suspect that the best way to do that is to gather data from working code,
but I've always been biased that way :)
-=R


>
> Willy
>
>

Received on Friday, 12 October 2012 23:00:24 UTC