Re: Would a header/schema compiler help? from Mike Belshe on 2012-08-02 (ietf-http-wg@w3.org from July to September 2012)

From: Mike Belshe <mike@belshe.com>
Date: Thu, 2 Aug 2012 15:09:11 -0700
To: Henrik Frystyk Nielsen <henrikn@microsoft.com>
Cc: Phillip Hallam-Baker <hallam@gmail.com>, "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>
Message-ID: <CABaLYCtNzTZjUPgknfNtjwhkUopf3gaHQ6SmBGLAywu07WAB1g@mail.gmail.com>
On Thu, Aug 2, 2012 at 1:53 PM, Henrik Frystyk Nielsen <
henrikn@microsoft.com> wrote:

> I also have trouble with the use of compression over the headers for two
> reasons:
>
> 1) While we know that compression can work wonders on entity bodies, it
> obscures the headers which are critical for inspection and manipulation by
> everybody down the message path.
>

So far, this has not been a problem.

The nice thing about a general purpose compressor is that it is very
generic - it works on any body of data.  Obviously, a structured schema can
be highly tailored, but is also less flexible to changes over time.


> 2) Further, it is unclear whether there is any noticeable performance gain
> from doing so. The only headers that today are open-ended are really
> User-Agent and Set-Cookie/Cookie pairs. In all our data where we don't
> include these headers we see no gain from using header compression
> whatsoever as long as you are conservative in how many headers you choose
> to include.
>

Typical headers are ~450 bytes on the request, and ~300 on the response.
 Cookies make it vary wildly.  The compressor cuts the initial header size
in half, subsequent headers by about 90%.

I'm not sure what data you are referring to - but I was able to see
performance differences here.  When you simulate typical web pages today
(~8 domains, ~80 resources, etc) with a low-end network (385Kbps down,
128Kbps up), you can do the math yourself, but saving 200bytes per request
* 80 requests over these types of links measures in the *seconds*....

Remember that most users are using asymmetric networks, with very small
uplink capability.



>
> In other words, based on the data we have, focusing on optimizing UA and
> cookie headers may offer tangible performance gains.
>

Of course they will.


>
> Unless we can find convincing data I would argue that we should start the
> header discussion by focusing on the headers that we know have problems and
> then go from there.
>

Compression is a problem which can always be solved differently and always
be a little "better".  But at the end of the day, this is not the key part
of the protocol - users aren't going to notice as long as you get the byte
count down by ~75%.

Don't get me wrong, I really don't care what we change it to - but I hope
we talk about something else, because the compressor easily solved if we
just live with good enough.

Some recommendations I'd put out to anyone recommending new compressors for
headers:
    * Target at least 75% or more compression over today's HTTP for typical
web pages.
    * Be straightforward to implement
    * Be able to compress repeated cookies well
    * Be tolerant to long-lived sessions; if your compressor builds up
state with every new header, it must have a way to rotate out the old data
(e.g. a rolling window)
    * Try to minimize RAM and CPU
    * Do not use tricks that give advantages to incumbent browsers (e.g.
don't hard-code compression for pronouns like "WebKit", "Windows", "Chrome"
or "Firefox")
    * Don't try to "clean up" HTTP/1.1's headers; this just creates
compatibility issues, so unless done for a very specific purpose wastes time

I hope those recommendations are useful.

Mike



>
> Henrik
>
> -----Original Message-----
> From: Phillip Hallam-Baker [mailto:hallam@gmail.com]
> Sent: Tuesday, July 17, 2012 07:10
> To: ietf-http-wg@w3.org Group
> Subject: Would a header/schema compiler help?
>
> One of the design choices I find a little troubling in SPDY is the use of
> header compression over a compact representation.
>
> Compression algorithms worry me because the only way I can implement them
> is to call someone's library. I can write my own code to implement RFC822
> header parsing. In fact I have done so a dozen times at least.
>
> ASN.1 and XML schemas are undoubtedly how to do a protocol 'right' in the
> academic sense of get some good abstractions in there. But they both come
> with hideous baggage. ASN.1 started too large and has been tweaked
> endlessly until it became nonsense. XML encoding is bloated and XML Schema
> drags in the whole SGML legacy that the WG was meant to chuck in the
> dustbin. I dare you to use a model group in your next XML spec.
>
>
> But JSON proves that it does not have to be that way. Over the past couple
> of months I have put together what amounts to a schema compiler for JSON. I
> am currently working on putting the compiler and the tools used to build it
> out on Github under an open source license.
>
> Unlike your regular schema compiler, this one makes it easy to retarget
> the code generation at a different language. Writing a JSON back end for C#
> took me about a week (and I was developing the tool at the same time). I
> could modify that code to generate C in a day. Want python, perl, Java, no
> problem.
>
> The JSON protocol synthesizer is built using the Goedel meta-synthesizer
> (which is of course written in itself). Comodo owns the protocol synth, I
> own the meta-synth. They should both be up on github under an MIT license
> in a few weeks.
>
>
> It seems to me that it would be useful to have a compiler for RFC822 style
> headers. The basic strategy being similar to a 'jack up' house renovation.
> First we jack up the level of abstraction in the HTTP spec by defining the
> headers in terms of a schema that is designed for that specific purpose.
> Then we build in the new ground floor (aka encoding) underneath it.
>
>
> --
> Website: http://hallambaker.com/
>
>
>
>
>
>
>
>
Received on Thursday, 2 August 2012 22:09:40 UTC