Re: delta encoding and state management

Hi William,

On Tue, Jan 22, 2013 at 12:33:37PM -0800, William Chan (?????????) wrote:
> From the SPDY whitepaper
> (http://www.chromium.org/spdy/spdy-whitepaper), we note that:
> "Header compression resulted in an ~88% reduction in the size of
> request headers and an ~85% reduction in the size of response headers.
> On the lower-bandwidth DSL link, in which the upload link is only 375
> Kbps, request header compression in particular, led to significant
> page load time improvements for certain sites (i.e. those that issued
> large number of resource requests). We found a reduction of 45 - 1142
> ms in page load time simply due to header compression."
> 
> That result was using gzip compression, but I don't really think
> there's a huge difference in PLT between stateful compression
> algorithms. That you use stateful compression at all is the biggest
> win, since as Mark already noted, big chunks of the headers are
> repeated opaque blobs. And I think the wins will only be greater in
> bandwidth constrained devices like mobile. I think this brings us back
> to the question, at what point do the wins of stateful compression
> outweigh the costs? Are implementers satisfied with the rough order of
> costs of stateful compression of algorithms like the delta encoding or
> simple compression?

I agree that most of the header overhead is from repeated headers.
In fact, most of the requests we see for large pages with 100 objects
contain many similar headers. I could be wrong, but I think that browsers
are aware about the fact that they're fetching many objects at once in
most situations (eg: images on an inline catalogue).

Thus maybe we should think a different way : initially the web was
designed to retrieve one object at a time and it made sense to have
one request, one response. Now we have much more contents and we
want many objects at once to load a page. Why now define that as the
standard way to load pages and bring in the ability to load *groups*
of objects ?

We could then send a request for several objects at once, all using
the same (encoded) headers, plus maybe additional per-object headers.
The smallest group is one object and works like today. But when you
need 10 images, 3 CSS and 2 JS, maybe it makes sense to send 1,2 or
3 requests only. We would also probably find it useful to define
a base for common objects.

We could then see requests like this :

    group 1
       header fields ...
       base http://static.example.com/images/articles/20130122/
       req1: GET corner-left.jpg
       req2: GET corner-right.jpg
       req3: GET center-banner.jpg
       req4: GET company-logo.png

etc...

Another big benefit I'm seeing there is that it's easy to switch from 1.1
to/from this encoding. And also intermediaries and servers will process
much less requests because they don't have to revalidate all headers each
time. The Host header would only be validated/rewritten once per group.
Cookies would be matched once per group, etc...

It would be processed exactly like pipelining, with responses delivered
in the same order as the requests. Intermediaries could even split that
into multiple streams to forward some of them to some servers and other
ones to other servers. Having the header fields and base URI before the
requests makes that easy because once they're passed, you can read all
requests as they come without the need to additionally buffer.

When you have an ETag or a date for an object, its I-M-S/I-N-M values
would be passed along with the requests and not the group.

I think this should often be more efficient than brute compression and
still probably compatible with it.

What do you think ?

Willy

Received on Tuesday, 22 January 2013 21:28:21 UTC