Re: Significantly reducing headers footprint

This is good work, Willy.

Any perf results on how much this will impact the user?  Given the stateful
nature of gzip already in use, I'm betting this has almost no impact for
most users?

There is a tradeoff; completely custom compression will introduce more
interop issues.  Registries of "well known headers" are notoriously painful
to maintain and keep versioned.

A few more comments below:


On Sun, Jun 10, 2012 at 4:17 PM, Willy Tarreau <w@1wt.eu> wrote:

> Hi,
>
> I recently managed to collect requests from some enterprise proxies to
> experiment with binary encoding as described in our draft [1].
>
> After some experimentation and discussions with some people, I managed to
> get significant gains [2] which could still be improved.
>
> What's currently performed is the following :
>  - message framing
>  - binary encoding of the HTTP version (2 bits)
>  - binary encoding of the method (4 bits)
>  - move Host header to the URI
>  - encoding of the URI relative to the previous one
>  - binary encoding of each header field names (1 byte)
>  - encoding of each header relative to the previous one.
>  - binary encoding of the If-Modified-Since date
>
> The code achieving this is available at [2]. It's an ugly PoC but it's
> a useful experimentation tool for me, feel free to use it to experiment
> with your own implementations if you like.
>
> I'm already observing request compression ratios of 90-92% on various
> requests, including on a site with a huge page with large cookies and
> URIs ; 132 kB of requests were reduced to 10kB. In fact while the draft
> suggests use of multiple header contexts (connection, common and message),
> now I'm feeling like we don't need to store 3 contexts anymore, one single
> is enough if requests remain relative to previous one.
>
> But I think that by typing a bit more the protocol, we could improve even
> further and at the same time improve interoperability. Among the things
> I am observing which still take some space in the page load of an online
> newspaper (127 objects, data were anonymized) :
>
>  - User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; fr;
> rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12
>    => Well, this one is only sent once over the connection, but we could
>       reduce this further by using a registery of known vendors/products
>       and incite vendors to emit just a few bytes (vendor/product/version).
>

I don't think the compressor should be learning about vendor-specific
information.  This gives advantages to certain browser incumbents and is
unfair to startups.  We absolutely MUST NOT give advantages to the current
popular browsers.


>  - Accept: text/css,*/*;q=0.1
>    => this one changes depending on what object the browser requests, so it
>       is less efficiently compressed :
>
>        1 Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
>        4 Accept: text/css,*/*;q=0.1
>        8 Accept: */*
>        1 Accept: image/png,image/*;q=0.8,*/*;q=0.5
>        2 Accept: */*
>        9 Accept: image/png,image/*;q=0.8,*/*;q=0.5
>        2 Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
>       90 Accept: image/png,image/*;q=0.8,*/*;q=0.5
>        1 Accept: */*
>        9 Accept: image/png,image/*;q=0.8,*/*;q=0.5
>
>    => With better request reordering, we could have this :
>
>       11 Accept: */*
>      109 Accept: image/png,image/*;q=0.8,*/*;q=0.5
>        4 Accept: text/css,*/*;q=0.1
>        3 Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
>

As long as the browser uses the same accept header from request to request
(which it generally does), this compresses to almost zero after the first
header block.


>
>    I'm already wondering if we have *that* many content-types and if we
> need
>    to use long words such as "application" everywhere.
>
>  - Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3
>    Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
>    Accept-Encoding: gzip,deflate
>
>    => Same comment as above concerning the number of possible values.
> However
>       these ones were all sent identical so the gain is more for the remote
>       parser than for the upstream link.
>
>  - Referer: http://www.example.com/
>    => referrers do compress quite well relative to each other. Still there
>       are many blogs and newspapers on the net today with very large URLs,
>       and their URLs cause very large referrers to be sent along with each
>       object composing the page. At least a better ordering of the requests
>       saves a few more hundred bytes for the whole page. In the end I only
>       got 4 different values :
>       http://www.example.com/
>
> http://www.example.com/sites/news/files/css/css_RWicSr_h9UxCJrAbE57UbNf_oNYhtaF5YghFXJemVNQ.css
>
> http://www.example.com/sites/news/files/css/css_lKoFARDAyB20ibb5wNG8nMhflDNNW_Nb9DsNprYt8mk.css
>
> http://www.example.com/sites/news/files/css/css_qSyFGRLc-tslOV1oF9GCzEe1eGDn4PP7vOM1HGymNYU.css
>
>    Among the improvements I'm thinking about, we could decide to use
> relative
>    URIs when the site is the same. I don't know either if it's of any use
> on
>    the server side to know that the request was emitted for a specific CSS.
>
>  - If-Modified-Since: Fri, 27 Apr 2012 14:41:31 GMT
>    => I have encoded this one on 32 and 64 bits and immediately saved 3.1
> and
>       2.6 kB respectively. Well, storing 4 more bytes per request might be
>       wasted considering that we probably don't need a nanosecond
> resolution
>       for 585 years. But 40-48 bits might be fine.


>  - Cache-Control: max-age=0
>    => I suspect the user hit the Refresh button, this was present in about
>       half the requests. Anyway, this raises the question of the length it
>       requires for something which is just a boolean here ("ignore cache").
>       Probably that a client has very few Cache-Control header values to
>       send, and that reducing this to a smaller set would be beneficial.
>

Trying to change the motivation or semantics of headers is a large
endeavor....  Not sure if the compression of the bits is the right
motivation for doing so.



>  - If-None-Match: "3013140661"
>    => I guess there is nothing we can do on this one, except suggest that
>       implementors use more bits and less bytes to emit their etags.
>
>  - Cookie: xtvrn=$OaiJty$; xtan327981=c; xtant327981=c; has_js=c;
> __utma=KBjWnx24Q.7qFKqmB7v.i0JDH91L_R.0kU2W1uL49.JM4KtFLV0b.C;
> __utmc=Rae9ZgQHz;
> __utmz=NRSZOcCWV.d5MlK5RJsi.-.f.N8J73w=S1SLuT_j0m.O8|VsIxwE=(jHw58obb)|r9SgsT=WQfZe8jr|pFSZGH=/@/qwDyMw3I;
> __gads=td=ASP_D5ml4Ebevrej:R=pvxltafqZK:x=E4FUn3YiNldW3rhxzX6YlCptZp8zF-b5qc;
> _chartbeat2=oQvb8k_G9tduhauf.LqOukjnlaaE7K.uDBaR79E1WT4t.Kr9L_lIrOtruE8;
> __qca=LC9oiRpFSWShYlxUtD37GJ2k8AL; __utmb=vG8UMEjrz.Qf.At.pXD61lUeHZ;
> pm8196_1=c; pm8194_1=c
>
>    => amazingly, this one compresses extremely well with the above scheme,
>       because additions are performed at the end so consecutive cookies
> keep
>       a lot in common, and changes are not too frequent. However, given the
>       omnipresent usage of cookies, I was wondering why we should not
> create
>       a new entity of its own for the cookies instead of abusing the Cookie
>       header. It would make it a lot easier for both ends to find what they
>       need. For instance, a load balancer just needs to find a server name
>       in the thing above. What a waste of on-wire bits and of CPU cycles !
>
> BTW, binary encoding would probably also help addressing a request I often
> hear in banking environments : the need to sign/encrypt/compress only
> certain
> headers or cookies. Right now when people do this, they have to
> base64-encode
> the result, which is another transformation at both ends and inflates the
> data. If we make provisions in the protocol for announcing encrypted or
> compressed headers using 2-3 bits, it might become more usable. I'm not
> convinced it provides any benefit between a browser and an origin server
> though. So maybe it will remain application-specific and the transport
> just has to make it easier to emit 8-bit data in header field values.
>

Happens all the time, yes.  Just make sure that HTTP2 -> HTTP1.1 definition
is preserved so that gateways still work.


>
> Has anyone any opinion on the subject above ? Or ideas about other things
> that terribly clobber the upstream pipe and that should be fixed in 2.0 ?
>
> I hope I'll soon find some time to update our draft to reflect recent
> updates
> and findings.
>

Again, I think we could spend a lot of time debating the compressor.  And
with one more registry or one more semantic header change from HTTP, there
will always be one more bit to compress out.  But these are, IMHO, already
diminishing returns for performance.  I hope we'll all focus on the more
important parts of the protocol (flow control, security, 1.x to 2.x
upgrades, etc) than compression.

Mike


> Regards,
> Willy
>
> --
> [1] http://tools.ietf.org/id/draft-tarreau-httpbis-network-friendly-00.txt
> [2] http://1wt.eu/http2/
>
>
>

Received on Monday, 11 June 2012 14:33:36 UTC