GOOGLE SAVES THE WORLD - Range request utility: Chrome vs FF vs IE vs Safari from C.Brunhuber@iaea.org on 2014-04-08 (ietf-http-wg@w3.org from April to June 2014)

From: <C.Brunhuber@iaea.org>
Date: Tue, 8 Apr 2014 15:31:33 +0000
To: <ietf-http-wg@w3.org>
CC: <K.Morgan@iaea.org>
Message-ID: <E4F201CE71D35249A658333A9722105D0100DBC6C1@sem002pd.sg.iaea.org>

Pardon the sensational subject line.

On Mon, 24 Mar 2014 10:26:19 -0700, Martin Thomson wrote: "Range requests serve a very narrow purpose..."
On Mon, 7 Apr 2014 14:13:54 +0800, David Krauss wrote: "Range requests don't get cached because they are uncommon..."
On Wed, 9 Apr 2014 00:12:35 +1000, Matthew Kerwin wrote: "To my mind browsers don't play any part in range discussions ... a browser wants to receive the whole object..."

We want to dispel this notion that range requests are a pathological feature of HTTP.

To quote from the introduction of HTTP-p7:
HTTP clients often encounter interrupted data transfers as a result of canceled requests or dropped connections. When a client has stored a partial representation, it is desirable to request the remainder of that representation in a subsequent request rather than transfer the entire representation. ... [1]

For example, slow or less-reliable mobile device networks make partially downloaded content common.

To illustrate the usefulness of range requests, we investigated the behavior of the major browsers (Chrome, IE, Firefox, Safari) with respect to dropped/canceled transfers.

Specifically, we investigated the behavior when the transfer is dropped while receiving:
1) displayable content (e.g. web page), and
2) downloadable content (e.g. installation package).

The results were mixed. The only browser which fully takes advantage of range requests is Chrome.

Why does Chrome go through all the trouble to take full advantage of range requests? We don't believe their main motivation is to save the world. It's a competitive advantage to have a browser with a better user experience. We believe, as do the browsers (at least Chrome), that range requests should be used as much as possible, whenever it makes sense - even if its altruistic and not capitalistic :).

Chrome
Caching: Saves all content in its content-encoded format (e.g. gzip or identity)
Displayable Content: Sends range requests for partial entities stored in the cache if and only if the original response contained the Content-Length header - which somehow signifies the content isn't dynamic. It does not matter if the transfer was canceled by closing the browser or externally and it does not matter if C-E is gzip or identity. The only disappointment is that Chrome actually sends two range requests. The first one asks for a single byte after the already downloaded portion and if successful the rest of the content by specifying the exact range (we can guess why they do this, but the motivation isn't entirely clear).
Downloadable Content: In case of downloads Chrome seems to have a bug: It sends a range request, but the range starts at an offset much smaller than size of the content downloaded so far.

Below is an example of how Chrome sends two range requests for a .css file from latimes.com which is encoded with C-E gzip (superfluous headers have been omitted)...

GET /hive/stylesheets/content.css HTTP/1.1
Accept-Encoding: gzip,deflate,sdch
Range: bytes=55620-55620
If-Range: "3349f-4ef52e61f2480"

HTTP/1.1 206 Partial Content
Server: Apache
ETag: "3349f-4ef52e61f2480"
Accept-Ranges: bytes
Content-Encoding: gzip
Content-Range: bytes 55620-55620/56410
Content-Length: 1

GET /hive/stylesheets/content.css HTTP/1.1
Accept-Encoding: gzip,deflate,sdch
Range: bytes=55620-56409
If-Range: "3349f-4ef52e61f2480"

HTTP/1.1 206 Partial Content
Server: Apache
ETag: "3349f-4ef52e61f2480"
Accept-Ranges: bytes
Content-Encoding: gzip
Content-Range: bytes 55620-56409/56410
Content-Length: 790

World-wide Internet usage data are difficult to find, but since Chrome has a large market share of browsers (desktop & mobile), it is safe to assume that saving bytes with range requests for dropped/canceled transfers adds up to a total savings of a large number of bytes per day. What is clear is that Google cares deeply about saving bandwidth. They were one of the first to use C-E gzip on their servers and still heavily use it today. Assume Google has 6G search requests per day [2]. The average Google search result page has ~240 kB size -> gzipped ~60 kB -> saved 180 kB. So 6G * 180 kB > 1 PB = 1,000 TB = 1,000,000,000,000,000 B save per day.

Firefox
Caching: Saves all content in its C-E format (e.g. gzip or identity)
Displayable Content: Sends range requests for partial entities stored in the cache if and only if the original response was C-E identity.
Downloadable Content: Sends range requests, but only if the user explicitly pauses the transfer. It will always send a range request for C-E identity. It will send range requests for C-E gzip, but if and only if the original response contained the Content-Length header.

IE
Caching: Decompresses all content (i.e. stores everything with C-E identity).
Displayable Content: Sends range requests, but only if C-E identity.
Downloadable Content: Sends range requests, but only if C-E identity.

Safari
Caching: Unknown. Apple tries to hide everything :)
Displayable Content: Never sends range requests.
Downloadable Content: Never sends range requests.

[1] http://tools.ietf.org/html/draft-ietf-httpbis-p5-range-26
[2] http://www.statisticbrain.com/google-searches/

Chris

This email message is intended only for the use of the named recipient. Information contained in this email message and its attachments may be privileged, confidential and protected from disclosure. If you are not the intended recipient, please do not read, copy, use or disclose this communication to others. Also please notify the sender by replying to this message and then delete it from your system.

Received on Tuesday, 8 April 2014 15:32:09 UTC