Re: delta encoding and state management from Roberto Peon on 2013-01-22 (ietf-http-wg@w3.org from January to March 2013)

From: Roberto Peon <grmocg@gmail.com>
Date: Tue, 22 Jan 2013 13:54:18 -0800
To: Willy Tarreau <w@1wt.eu>
Cc: "William Chan (?????????)" <willchan@chromium.org>, James M Snell <jasnell@gmail.com>, Nico Williams <nico@cryptonector.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <CAP+FsNcm_VBOsbptkLoOQXfgM-xAfYiZuqZusDm2YkoiszUfxA@mail.gmail.com>
The thing that isn't in delta, etc. already is the idea of 'rooting' the
path space with the single request (which I like, but... it is subject to
the CRIME exploit if path-prefix grouping is done automatically by the
browser (instead of being defined by the content-developer)).

IF we take your proposal for eliminating much of the common-path prefix and
ensure that it isn't subject to CRIME, that is a winner in any scheme.
-=R


On Tue, Jan 22, 2013 at 1:51 PM, Roberto Peon <grmocg@gmail.com> wrote:

> You've described server push+stateful compression (delta) pretty closely
> there ('cause that is what we get when we combine them, without requiring
> web page writers to change how they write their pages...)! :)
>
> With server push, you can do one request, many responses (except that you
> can also cache and cancel and prioritize them, unlike bundling or inlining
> or pipelining, which has nasty head-of-line blocking and infinite buffering
> requirements... bleh).
> With the delta compressor, you can define 'header groups', which allow you
> to do exactly what you just described. The default implementation as it
> exists today just guesses at the groupings by examining the hostname, but
> that is a very naive approach-- splitting based on cookies and other
> repeated fields makes the most sense.
>
> The biggest hurdle, at least in my opinion, to usage of the new features
> is how much effort the content writers have to put in to change their
> content (basically never happens), or change their knowledge of the best
> practices (also difficult :( ). The best solution (again in my opinion), is
> one where the optimizations can be done automatically (while not
> necessarily perfectly, close enough :) ) are, thus freeing ourselves from
> both categories...
>
> -=R
>
>
> On Tue, Jan 22, 2013 at 1:27 PM, Willy Tarreau <w@1wt.eu> wrote:
>
>> Hi William,
>>
>> On Tue, Jan 22, 2013 at 12:33:37PM -0800, William Chan (?????????) wrote:
>> > From the SPDY whitepaper
>> > (http://www.chromium.org/spdy/spdy-whitepaper), we note that:
>> > "Header compression resulted in an ~88% reduction in the size of
>> > request headers and an ~85% reduction in the size of response headers.
>> > On the lower-bandwidth DSL link, in which the upload link is only 375
>> > Kbps, request header compression in particular, led to significant
>> > page load time improvements for certain sites (i.e. those that issued
>> > large number of resource requests). We found a reduction of 45 - 1142
>> > ms in page load time simply due to header compression."
>> >
>> > That result was using gzip compression, but I don't really think
>> > there's a huge difference in PLT between stateful compression
>> > algorithms. That you use stateful compression at all is the biggest
>> > win, since as Mark already noted, big chunks of the headers are
>> > repeated opaque blobs. And I think the wins will only be greater in
>> > bandwidth constrained devices like mobile. I think this brings us back
>> > to the question, at what point do the wins of stateful compression
>> > outweigh the costs? Are implementers satisfied with the rough order of
>> > costs of stateful compression of algorithms like the delta encoding or
>> > simple compression?
>>
>> I agree that most of the header overhead is from repeated headers.
>> In fact, most of the requests we see for large pages with 100 objects
>> contain many similar headers. I could be wrong, but I think that browsers
>> are aware about the fact that they're fetching many objects at once in
>> most situations (eg: images on an inline catalogue).
>>
>> Thus maybe we should think a different way : initially the web was
>> designed to retrieve one object at a time and it made sense to have
>> one request, one response. Now we have much more contents and we
>> want many objects at once to load a page. Why now define that as the
>> standard way to load pages and bring in the ability to load *groups*
>> of objects ?
>>
>> We could then send a request for several objects at once, all using
>> the same (encoded) headers, plus maybe additional per-object headers.
>> The smallest group is one object and works like today. But when you
>> need 10 images, 3 CSS and 2 JS, maybe it makes sense to send 1,2 or
>> 3 requests only. We would also probably find it useful to define
>> a base for common objects.
>>
>> We could then see requests like this :
>>
>>     group 1
>>        header fields ...
>>        base http://static.example.com/images/articles/20130122/
>>        req1: GET corner-left.jpg
>>        req2: GET corner-right.jpg
>>        req3: GET center-banner.jpg
>>        req4: GET company-logo.png
>>
>> etc...
>>
>> Another big benefit I'm seeing there is that it's easy to switch from 1.1
>> to/from this encoding. And also intermediaries and servers will process
>> much less requests because they don't have to revalidate all headers each
>> time. The Host header would only be validated/rewritten once per group.
>> Cookies would be matched once per group, etc...
>>
>> It would be processed exactly like pipelining, with responses delivered
>> in the same order as the requests. Intermediaries could even split that
>> into multiple streams to forward some of them to some servers and other
>> ones to other servers. Having the header fields and base URI before the
>> requests makes that easy because once they're passed, you can read all
>> requests as they come without the need to additionally buffer.
>>
>> When you have an ETag or a date for an object, its I-M-S/I-N-M values
>> would be passed along with the requests and not the group.
>>
>> I think this should often be more efficient than brute compression and
>> still probably compatible with it.
>>
>> What do you think ?
>>
>> Willy
>>
>>
>
Received on Tuesday, 22 January 2013 21:54:46 UTC