- From: Roberto Peon <grmocg@gmail.com>
- Date: Sun, 10 Jun 2012 16:39:37 -0700
- To: Willy Tarreau <w@1wt.eu>
- Cc: ietf-http-wg@w3.org
- Message-ID: <CAP+FsNeqeihCZbbDG3mO55M9wEVEuFixGfrdPQ8jkSgSmjzN3A@mail.gmail.com>
On Sun, Jun 10, 2012 at 4:17 PM, Willy Tarreau <w@1wt.eu> wrote: > Hi, > > I recently managed to collect requests from some enterprise proxies to > experiment with binary encoding as described in our draft [1]. > > After some experimentation and discussions with some people, I managed to > get significant gains [2] which could still be improved. > > What's currently performed is the following : > - message framing > - binary encoding of the HTTP version (2 bits) > - binary encoding of the method (4 bits) > - move Host header to the URI > - encoding of the URI relative to the previous one > - binary encoding of each header field names (1 byte) > - encoding of each header relative to the previous one. > - binary encoding of the If-Modified-Since date > > The code achieving this is available at [2]. It's an ugly PoC but it's > a useful experimentation tool for me, feel free to use it to experiment > with your own implementations if you like. > > I'm already observing request compression ratios of 90-92% on various > requests, including on a site with a huge page with large cookies and > URIs ; 132 kB of requests were reduced to 10kB. In fact while the draft > suggests use of multiple header contexts (connection, common and message), > now I'm feeling like we don't need to store 3 contexts anymore, one single > is enough if requests remain relative to previous one. > For my deployment, I'm fairly certain this would not be all that common. Two contexts may be enough 'connection' and 'common', but I think you had it right the first time. The more clients you have and are aggregating through to elsewhere, to more advantageous that scheme becomes. > > But I think that by typing a bit more the protocol, we could improve even > further and at the same time improve interoperability. Among the things > I am observing which still take some space in the page load of an online > newspaper (127 objects, data were anonymized) : > > - User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; fr; > rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12 > => Well, this one is only sent once over the connection, but we could > reduce this further by using a registery of known vendors/products > and incite vendors to emit just a few bytes (vendor/product/version). > > - Accept: text/css,*/*;q=0.1 > => this one changes depending on what object the browser requests, so it > is less efficiently compressed : > > 1 Accept: > text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 > 4 Accept: text/css,*/*;q=0.1 > 8 Accept: */* > 1 Accept: image/png,image/*;q=0.8,*/*;q=0.5 > 2 Accept: */* > 9 Accept: image/png,image/*;q=0.8,*/*;q=0.5 > 2 Accept: > text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 > 90 Accept: image/png,image/*;q=0.8,*/*;q=0.5 > 1 Accept: */* > 9 Accept: image/png,image/*;q=0.8,*/*;q=0.5 > > => With better request reordering, we could have this : > > 11 Accept: */* > 109 Accept: image/png,image/*;q=0.8,*/*;q=0.5 > 4 Accept: text/css,*/*;q=0.1 > 3 Accept: > text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 > Achieving this seems difficult? How would we get a reording to occur in a reasonable manner? > > I'm already wondering if we have *that* many content-types and if we > need > to use long words such as "application" everywhere. > We were quite wordy in the past :) > > - Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3 > Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 > Accept-Encoding: gzip,deflate > > => Same comment as above concerning the number of possible values. > However > these ones were all sent identical so the gain is more for the remote > parser than for the upstream link. > > - Referer: http://www.example.com/ > => referrers do compress quite well relative to each other. Still there > are many blogs and newspapers on the net today with very large URLs, > and their URLs cause very large referrers to be sent along with each > object composing the page. At least a better ordering of the requests > saves a few more hundred bytes for the whole page. In the end I only > got 4 different values : > http://www.example.com/ > > http://www.example.com/sites/news/files/css/css_RWicSr_h9UxCJrAbE57UbNf_oNYhtaF5YghFXJemVNQ.css > > http://www.example.com/sites/news/files/css/css_lKoFARDAyB20ibb5wNG8nMhflDNNW_Nb9DsNprYt8mk.css > > http://www.example.com/sites/news/files/css/css_qSyFGRLc-tslOV1oF9GCzEe1eGDn4PP7vOM1HGymNYU.css > > Among the improvements I'm thinking about, we could decide to use > relative > URIs when the site is the same. I don't know either if it's of any use > on > the server side to know that the request was emitted for a specific CSS. > > - If-Modified-Since: Fri, 27 Apr 2012 14:41:31 GMT > => I have encoded this one on 32 and 64 bits and immediately saved 3.1 > and > 2.6 kB respectively. Well, storing 4 more bytes per request might be > wasted considering that we probably don't need a nanosecond > resolution > for 585 years. But 40-48 bits might be fine. > > - Cache-Control: max-age=0 > => I suspect the user hit the Refresh button, this was present in about > half the requests. Anyway, this raises the question of the length it > requires for something which is just a boolean here ("ignore cache"). > Probably that a client has very few Cache-Control header values to > send, and that reducing this to a smaller set would be beneficial. > > - If-None-Match: "3013140661" > => I guess there is nothing we can do on this one, except suggest that > implementors use more bits and less bytes to emit their etags. > > - Cookie: xtvrn=$OaiJty$; xtan327981=c; xtant327981=c; has_js=c; > __utma=KBjWnx24Q.7qFKqmB7v.i0JDH91L_R.0kU2W1uL49.JM4KtFLV0b.C; > __utmc=Rae9ZgQHz; > __utmz=NRSZOcCWV.d5MlK5RJsi.-.f.N8J73w=S1SLuT_j0m.O8|VsIxwE=(jHw58obb)|r9SgsT=WQfZe8jr|pFSZGH=/@/qwDyMw3I; > __gads=td=ASP_D5ml4Ebevrej:R=pvxltafqZK:x=E4FUn3YiNldW3rhxzX6YlCptZp8zF-b5qc; > _chartbeat2=oQvb8k_G9tduhauf.LqOukjnlaaE7K.uDBaR79E1WT4t.Kr9L_lIrOtruE8; > __qca=LC9oiRpFSWShYlxUtD37GJ2k8AL; __utmb=vG8UMEjrz.Qf.At.pXD61lUeHZ; > pm8196_1=c; pm8194_1=c > > => amazingly, this one compresses extremely well with the above scheme, > because additions are performed at the end so consecutive cookies > keep > a lot in common, and changes are not too frequent. However, given the > omnipresent usage of cookies, I was wondering why we should not > create > a new entity of its own for the cookies instead of abusing the Cookie > header. It would make it a lot easier for both ends to find what they > need. For instance, a load balancer just needs to find a server name > in the thing above. What a waste of on-wire bits and of CPU cycles ! > You're suggesting breaking the above into smaller, addressable bits? > > BTW, binary encoding would probably also help addressing a request I often > hear in banking environments : the need to sign/encrypt/compress only > certain > headers or cookies. Right now when people do this, they have to > base64-encode > the result, which is another transformation at both ends and inflates the > data. If we make provisions in the protocol for announcing encrypted or > compressed headers using 2-3 bits, it might become more usable. I'm not > convinced it provides any benefit between a browser and an origin server > though. So maybe it will remain application-specific and the transport > just has to make it easier to emit 8-bit data in header field values. > > > Has anyone any opinion on the subject above ? Or ideas about other things > that terribly clobber the upstream pipe and that should be fixed in 2.0 ? > I like binary framing because it is significantly easier to get right and works well when we're considering things other than just plain HTTP. Token-based parsing is quite annoying in comparison-- it either requires significant implementation complexity to minimize memory. With length-based framing, the implementation complexity is decreased arguably for everyone and certainly in cases where you wish to be efficient with buffers. -=R > I hope I'll soon find some time to update our draft to reflect recent > updates > and findings. > > Regards, > Willy > > -- > [1] http://tools.ietf.org/id/draft-tarreau-httpbis-network-friendly-00.txt > [2] http://1wt.eu/http2/ > > >
Received on Sunday, 10 June 2012 23:40:07 UTC