- From: Willy Tarreau <w@1wt.eu>
- Date: Mon, 11 Jun 2012 01:17:57 +0200
- To: ietf-http-wg@w3.org
Hi, I recently managed to collect requests from some enterprise proxies to experiment with binary encoding as described in our draft [1]. After some experimentation and discussions with some people, I managed to get significant gains [2] which could still be improved. What's currently performed is the following : - message framing - binary encoding of the HTTP version (2 bits) - binary encoding of the method (4 bits) - move Host header to the URI - encoding of the URI relative to the previous one - binary encoding of each header field names (1 byte) - encoding of each header relative to the previous one. - binary encoding of the If-Modified-Since date The code achieving this is available at [2]. It's an ugly PoC but it's a useful experimentation tool for me, feel free to use it to experiment with your own implementations if you like. I'm already observing request compression ratios of 90-92% on various requests, including on a site with a huge page with large cookies and URIs ; 132 kB of requests were reduced to 10kB. In fact while the draft suggests use of multiple header contexts (connection, common and message), now I'm feeling like we don't need to store 3 contexts anymore, one single is enough if requests remain relative to previous one. But I think that by typing a bit more the protocol, we could improve even further and at the same time improve interoperability. Among the things I am observing which still take some space in the page load of an online newspaper (127 objects, data were anonymized) : - User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; fr; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12 => Well, this one is only sent once over the connection, but we could reduce this further by using a registery of known vendors/products and incite vendors to emit just a few bytes (vendor/product/version). - Accept: text/css,*/*;q=0.1 => this one changes depending on what object the browser requests, so it is less efficiently compressed : 1 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 4 Accept: text/css,*/*;q=0.1 8 Accept: */* 1 Accept: image/png,image/*;q=0.8,*/*;q=0.5 2 Accept: */* 9 Accept: image/png,image/*;q=0.8,*/*;q=0.5 2 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 90 Accept: image/png,image/*;q=0.8,*/*;q=0.5 1 Accept: */* 9 Accept: image/png,image/*;q=0.8,*/*;q=0.5 => With better request reordering, we could have this : 11 Accept: */* 109 Accept: image/png,image/*;q=0.8,*/*;q=0.5 4 Accept: text/css,*/*;q=0.1 3 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 I'm already wondering if we have *that* many content-types and if we need to use long words such as "application" everywhere. - Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Accept-Encoding: gzip,deflate => Same comment as above concerning the number of possible values. However these ones were all sent identical so the gain is more for the remote parser than for the upstream link. - Referer: http://www.example.com/ => referrers do compress quite well relative to each other. Still there are many blogs and newspapers on the net today with very large URLs, and their URLs cause very large referrers to be sent along with each object composing the page. At least a better ordering of the requests saves a few more hundred bytes for the whole page. In the end I only got 4 different values : http://www.example.com/ http://www.example.com/sites/news/files/css/css_RWicSr_h9UxCJrAbE57UbNf_oNYhtaF5YghFXJemVNQ.css http://www.example.com/sites/news/files/css/css_lKoFARDAyB20ibb5wNG8nMhflDNNW_Nb9DsNprYt8mk.css http://www.example.com/sites/news/files/css/css_qSyFGRLc-tslOV1oF9GCzEe1eGDn4PP7vOM1HGymNYU.css Among the improvements I'm thinking about, we could decide to use relative URIs when the site is the same. I don't know either if it's of any use on the server side to know that the request was emitted for a specific CSS. - If-Modified-Since: Fri, 27 Apr 2012 14:41:31 GMT => I have encoded this one on 32 and 64 bits and immediately saved 3.1 and 2.6 kB respectively. Well, storing 4 more bytes per request might be wasted considering that we probably don't need a nanosecond resolution for 585 years. But 40-48 bits might be fine. - Cache-Control: max-age=0 => I suspect the user hit the Refresh button, this was present in about half the requests. Anyway, this raises the question of the length it requires for something which is just a boolean here ("ignore cache"). Probably that a client has very few Cache-Control header values to send, and that reducing this to a smaller set would be beneficial. - If-None-Match: "3013140661" => I guess there is nothing we can do on this one, except suggest that implementors use more bits and less bytes to emit their etags. - Cookie: xtvrn=$OaiJty$; xtan327981=c; xtant327981=c; has_js=c; __utma=KBjWnx24Q.7qFKqmB7v.i0JDH91L_R.0kU2W1uL49.JM4KtFLV0b.C; __utmc=Rae9ZgQHz; __utmz=NRSZOcCWV.d5MlK5RJsi.-.f.N8J73w=S1SLuT_j0m.O8|VsIxwE=(jHw58obb)|r9SgsT=WQfZe8jr|pFSZGH=/@/qwDyMw3I; __gads=td=ASP_D5ml4Ebevrej:R=pvxltafqZK:x=E4FUn3YiNldW3rhxzX6YlCptZp8zF-b5qc; _chartbeat2=oQvb8k_G9tduhauf.LqOukjnlaaE7K.uDBaR79E1WT4t.Kr9L_lIrOtruE8; __qca=LC9oiRpFSWShYlxUtD37GJ2k8AL; __utmb=vG8UMEjrz.Qf.At.pXD61lUeHZ; pm8196_1=c; pm8194_1=c => amazingly, this one compresses extremely well with the above scheme, because additions are performed at the end so consecutive cookies keep a lot in common, and changes are not too frequent. However, given the omnipresent usage of cookies, I was wondering why we should not create a new entity of its own for the cookies instead of abusing the Cookie header. It would make it a lot easier for both ends to find what they need. For instance, a load balancer just needs to find a server name in the thing above. What a waste of on-wire bits and of CPU cycles ! BTW, binary encoding would probably also help addressing a request I often hear in banking environments : the need to sign/encrypt/compress only certain headers or cookies. Right now when people do this, they have to base64-encode the result, which is another transformation at both ends and inflates the data. If we make provisions in the protocol for announcing encrypted or compressed headers using 2-3 bits, it might become more usable. I'm not convinced it provides any benefit between a browser and an origin server though. So maybe it will remain application-specific and the transport just has to make it easier to emit 8-bit data in header field values. Has anyone any opinion on the subject above ? Or ideas about other things that terribly clobber the upstream pipe and that should be fixed in 2.0 ? I hope I'll soon find some time to update our draft to reflect recent updates and findings. Regards, Willy -- [1] http://tools.ietf.org/id/draft-tarreau-httpbis-network-friendly-00.txt [2] http://1wt.eu/http2/
Received on Sunday, 10 June 2012 23:18:26 UTC