- From: Roberto Peon <grmocg@gmail.com>
- Date: Sun, 10 Jun 2012 16:39:37 -0700
- To: Willy Tarreau <w@1wt.eu>
- Cc: ietf-http-wg@w3.org
- Message-ID: <CAP+FsNeqeihCZbbDG3mO55M9wEVEuFixGfrdPQ8jkSgSmjzN3A@mail.gmail.com>
On Sun, Jun 10, 2012 at 4:17 PM, Willy Tarreau <w@1wt.eu> wrote:
> Hi,
>
> I recently managed to collect requests from some enterprise proxies to
> experiment with binary encoding as described in our draft [1].
>
> After some experimentation and discussions with some people, I managed to
> get significant gains [2] which could still be improved.
>
> What's currently performed is the following :
> - message framing
> - binary encoding of the HTTP version (2 bits)
> - binary encoding of the method (4 bits)
> - move Host header to the URI
> - encoding of the URI relative to the previous one
> - binary encoding of each header field names (1 byte)
> - encoding of each header relative to the previous one.
> - binary encoding of the If-Modified-Since date
>
> The code achieving this is available at [2]. It's an ugly PoC but it's
> a useful experimentation tool for me, feel free to use it to experiment
> with your own implementations if you like.
>
> I'm already observing request compression ratios of 90-92% on various
> requests, including on a site with a huge page with large cookies and
> URIs ; 132 kB of requests were reduced to 10kB. In fact while the draft
> suggests use of multiple header contexts (connection, common and message),
> now I'm feeling like we don't need to store 3 contexts anymore, one single
> is enough if requests remain relative to previous one.
>
For my deployment, I'm fairly certain this would not be all that common.
Two contexts may be enough 'connection' and 'common', but I think you had
it right the first time.
The more clients you have and are aggregating through to elsewhere, to more
advantageous that scheme becomes.
>
> But I think that by typing a bit more the protocol, we could improve even
> further and at the same time improve interoperability. Among the things
> I am observing which still take some space in the page load of an online
> newspaper (127 objects, data were anonymized) :
>
> - User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; fr;
> rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12
> => Well, this one is only sent once over the connection, but we could
> reduce this further by using a registery of known vendors/products
> and incite vendors to emit just a few bytes (vendor/product/version).
>
> - Accept: text/css,*/*;q=0.1
> => this one changes depending on what object the browser requests, so it
> is less efficiently compressed :
>
> 1 Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> 4 Accept: text/css,*/*;q=0.1
> 8 Accept: */*
> 1 Accept: image/png,image/*;q=0.8,*/*;q=0.5
> 2 Accept: */*
> 9 Accept: image/png,image/*;q=0.8,*/*;q=0.5
> 2 Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> 90 Accept: image/png,image/*;q=0.8,*/*;q=0.5
> 1 Accept: */*
> 9 Accept: image/png,image/*;q=0.8,*/*;q=0.5
>
> => With better request reordering, we could have this :
>
> 11 Accept: */*
> 109 Accept: image/png,image/*;q=0.8,*/*;q=0.5
> 4 Accept: text/css,*/*;q=0.1
> 3 Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
>
Achieving this seems difficult? How would we get a reording to occur in a
reasonable manner?
>
> I'm already wondering if we have *that* many content-types and if we
> need
> to use long words such as "application" everywhere.
>
We were quite wordy in the past :)
>
> - Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3
> Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
> Accept-Encoding: gzip,deflate
>
> => Same comment as above concerning the number of possible values.
> However
> these ones were all sent identical so the gain is more for the remote
> parser than for the upstream link.
>
> - Referer: http://www.example.com/
> => referrers do compress quite well relative to each other. Still there
> are many blogs and newspapers on the net today with very large URLs,
> and their URLs cause very large referrers to be sent along with each
> object composing the page. At least a better ordering of the requests
> saves a few more hundred bytes for the whole page. In the end I only
> got 4 different values :
> http://www.example.com/
>
> http://www.example.com/sites/news/files/css/css_RWicSr_h9UxCJrAbE57UbNf_oNYhtaF5YghFXJemVNQ.css
>
> http://www.example.com/sites/news/files/css/css_lKoFARDAyB20ibb5wNG8nMhflDNNW_Nb9DsNprYt8mk.css
>
> http://www.example.com/sites/news/files/css/css_qSyFGRLc-tslOV1oF9GCzEe1eGDn4PP7vOM1HGymNYU.css
>
> Among the improvements I'm thinking about, we could decide to use
> relative
> URIs when the site is the same. I don't know either if it's of any use
> on
> the server side to know that the request was emitted for a specific CSS.
>
> - If-Modified-Since: Fri, 27 Apr 2012 14:41:31 GMT
> => I have encoded this one on 32 and 64 bits and immediately saved 3.1
> and
> 2.6 kB respectively. Well, storing 4 more bytes per request might be
> wasted considering that we probably don't need a nanosecond
> resolution
> for 585 years. But 40-48 bits might be fine.
>
> - Cache-Control: max-age=0
> => I suspect the user hit the Refresh button, this was present in about
> half the requests. Anyway, this raises the question of the length it
> requires for something which is just a boolean here ("ignore cache").
> Probably that a client has very few Cache-Control header values to
> send, and that reducing this to a smaller set would be beneficial.
>
> - If-None-Match: "3013140661"
> => I guess there is nothing we can do on this one, except suggest that
> implementors use more bits and less bytes to emit their etags.
>
> - Cookie: xtvrn=$OaiJty$; xtan327981=c; xtant327981=c; has_js=c;
> __utma=KBjWnx24Q.7qFKqmB7v.i0JDH91L_R.0kU2W1uL49.JM4KtFLV0b.C;
> __utmc=Rae9ZgQHz;
> __utmz=NRSZOcCWV.d5MlK5RJsi.-.f.N8J73w=S1SLuT_j0m.O8|VsIxwE=(jHw58obb)|r9SgsT=WQfZe8jr|pFSZGH=/@/qwDyMw3I;
> __gads=td=ASP_D5ml4Ebevrej:R=pvxltafqZK:x=E4FUn3YiNldW3rhxzX6YlCptZp8zF-b5qc;
> _chartbeat2=oQvb8k_G9tduhauf.LqOukjnlaaE7K.uDBaR79E1WT4t.Kr9L_lIrOtruE8;
> __qca=LC9oiRpFSWShYlxUtD37GJ2k8AL; __utmb=vG8UMEjrz.Qf.At.pXD61lUeHZ;
> pm8196_1=c; pm8194_1=c
>
> => amazingly, this one compresses extremely well with the above scheme,
> because additions are performed at the end so consecutive cookies
> keep
> a lot in common, and changes are not too frequent. However, given the
> omnipresent usage of cookies, I was wondering why we should not
> create
> a new entity of its own for the cookies instead of abusing the Cookie
> header. It would make it a lot easier for both ends to find what they
> need. For instance, a load balancer just needs to find a server name
> in the thing above. What a waste of on-wire bits and of CPU cycles !
>
You're suggesting breaking the above into smaller, addressable bits?
>
> BTW, binary encoding would probably also help addressing a request I often
> hear in banking environments : the need to sign/encrypt/compress only
> certain
> headers or cookies. Right now when people do this, they have to
> base64-encode
> the result, which is another transformation at both ends and inflates the
> data. If we make provisions in the protocol for announcing encrypted or
> compressed headers using 2-3 bits, it might become more usable. I'm not
> convinced it provides any benefit between a browser and an origin server
> though. So maybe it will remain application-specific and the transport
> just has to make it easier to emit 8-bit data in header field values.
>
>
> Has anyone any opinion on the subject above ? Or ideas about other things
> that terribly clobber the upstream pipe and that should be fixed in 2.0 ?
>
I like binary framing because it is significantly easier to get right and
works well when we're considering things other than just plain HTTP.
Token-based parsing is quite annoying in comparison-- it either requires
significant implementation complexity to minimize memory. With length-based
framing, the implementation complexity is decreased arguably for everyone
and certainly in cases where you wish to be efficient with buffers.
-=R
> I hope I'll soon find some time to update our draft to reflect recent
> updates
> and findings.
>
> Regards,
> Willy
>
> --
> [1] http://tools.ietf.org/id/draft-tarreau-httpbis-network-friendly-00.txt
> [2] http://1wt.eu/http2/
>
>
>
Received on Sunday, 10 June 2012 23:40:07 UTC