Re: SPDY and the HTTP Binding from Willy Tarreau on 2012-10-12 (ietf-http-wg@w3.org from October to December 2012)

From: Willy Tarreau <w@1wt.eu>
Date: Sat, 13 Oct 2012 01:11:06 +0200
To: Roberto Peon <grmocg@gmail.com>
Cc: Poul-Henning Kamp <phk@phk.freebsd.dk>, James M Snell <jasnell@gmail.com>, ietf-http-wg@w3.org
Message-ID: <20121012231106.GI14734@1wt.eu>
On Fri, Oct 12, 2012 at 03:59:55PM -0700, Roberto Peon wrote:
> On Fri, Oct 12, 2012 at 3:53 PM, Willy Tarreau <w@1wt.eu> wrote:
> 
> > Hi Roberto,
> >
> > On Fri, Oct 12, 2012 at 02:49:20PM -0700, Roberto Peon wrote:
> > > The most recent output, copy/pasted is:
> > >
> > > "Delta-coding took: 0.642199 seconds for: 104300 header frames or
> > > 6.15723e-06 per header or 162411 headers/sec or 8.88429e+07 bytes/sec"
> > >
> > > So, ~89 million bytes/second and 162k requests/second for the
> > delta-coding
> > > on one core.
> >
> > It does not seem bad, and I also know that it's hard to compare numbers.
> > I like to count in terms of bytes per second or headers per second, but
> > obviously it depends on the coding scheme.
> >
> > Right now I made a test on haproxy using a request to pinterest.com that
> > I captured from Firefox 13 (didn't know the site so I tried it). The
> > request looks like this, it's 282 bytes long and has 7 header fields :
> >
> >   GET / HTTP/1.1
> >   Host: pinterest.com
> >   User-Agent: Mozilla/5.0 (X11; Linux i686; rv:13.0) Gecko/20100101
> > Firefox/13.0.1
> >   Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> >   Accept-Language: en-us,en;q=0.5
> >   Accept-Encoding: gzip, deflate
> >   Connection: keep-alive
> >
> > It was running on a single core of a core i7 2600 @3.4 GHz. I sent it 4
> > million times to haproxy which sent a redirect on them, and it took 3.609
> > seconds for the 4 million reqs, which is 1.1 million req/s, which is also
> > exactly the same number as it reports in the stats, and 312 MB/s. So at
> > first glance it's 3.5-7 times faster on a single core than the compressor
> > alone. So this would mean that it would spend 88% of its CPU time in the
> > compressor alone, and the 12% remaining doing its job.
> >
> 
> No, that means that a loadtest client would spend 88% of its time doing
> compression. :)

:-)

> Decompresion is faster, and 'recompression' is also faster, as you can
> resuse much of the client's compression state, at least theoretically.

Well, responses still need to be sent to clients, so compression time
matters too. That's where I think that tokenizing must help, as it
saves the CPU from having to deal with protocols made for humans.

> > I understand the code is not optimized yet, but this typically is the type
> > of thing I want us to be extremely careful about, because it's very easy
> > to completely kill performance for the last percent of optimization over
> > the wire. In fact I'm not that much worried for the 1.1-to-2.0 conversion
> > because as time goes, the need for this work will fade away and won't
> > represent most of the CPU usage. But routing and processing 2.0 to 2.0
> > should be optimally fast.
> >
> 
> I know that we're coming at this from different motiviations as well-- I
> suspect that most sites that are handling millions of requests/second are
> perfectly happy to spend the money to get another HAProxy box in exchange
> for lower client latency, if that is the tradeoff they can get, since much
> of the time lower latency translates into higher conversion rates and thus
> more profit.

What is important in my opinion is to keep the CPU overhead low. I would have
loved to see a CPU improvement instead, but we're not there yet :-)

I think it is reasonable to expect some waste when converting 1.1 to 2.0, but
we must absolutely avoid having to explain to end users that switching from
1.1 to 2.0 will require 5 times more hardware, because even if the hardware
is cheap, managing it is not always and synchronizing states between them is
an entire new story. So there are trade-offs there too (eg: be less efficient
at fighting DDoSes).

> I hope that we'll come to a useful compromise where we can get
> consensus on the latency/CPU tradeoff, and vastly improve over what we've
> had with SPDY thusfar.

I think it's already better anyway because I don't remind having ever
seen zlib reach such numbers. Still the goal would be to be faster than
HTTP/1.1 :-)

> I suspect that the best way to do that is to gather data from working code,
> but I've always been biased that way :)

I agree with you on this.

Cheers,
Willy
Received on Friday, 12 October 2012 23:11:32 UTC