Re: Ambiguity in the Range header from Roy T. Fielding on 2012-10-04 (ietf-http-wg@w3.org from October to December 2012)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Thu, 4 Oct 2012 00:21:06 -0700
To: Zhong Yu <zhong.j.yu@gmail.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <096EA7C7-98DD-4A3C-A411-0502E01517B9@gbiv.com>
On Oct 3, 2012, at 11:54 PM, Zhong Yu wrote:
> On Thu, Oct 4, 2012 at 1:31 AM, Roy T. Fielding <fielding@gbiv.com> wrote:
>> On Oct 3, 2012, at 10:04 PM, Zhong Yu wrote:
>> 
>>> When a request contains a Range header, it specifies a (byte) range of
>>> the representation body. However, the server doesn't know which
>>> representation the client is talking about.
>> 
>> The selected representation.
>> 
>>> Here is an example of firefox failing to resume download a gzip-ed body:
>>> 
>>> request 1
>>> 
>>> GET / HTTP/1.1
>>> Accept-Encoding: gzip, deflate
>>> 
>>> response 1
>>> 
>>> HTTP/1.1 200 OK
>>> Accept-Ranges: bytes
>>> Content-Encoding: gzip
>>> ETag: "135e962713f.gz"
>>> Last-Modified: Tue, 06 Mar 2012 19:00:37 GMT
>>> Content-Length: 182,249,279
>>> 
>>> Firefox decompress the body on the fly, and saves the decompressed
>>> content to disk.
>>> 
>>> Now pause the download, firefox has 68,712,649 bytes decompressed data on disk.
>>> 
>>> Now resume the download, firefox tries to request range [68,712,649-]
>>> of uncompressed body
>> 
>> That's would be a bug in Firefox.  Are you sure it does that?
> 
> In which way this is a bug? How should Firefox behave?

As I explained, it should be caching the original message and
making range requests based on that -- not based on arbitrary
decompressed disk files.

>> Please tell me you just made up these examples -- there are no commas
>> allowed in Content-Length and range specifiers.
>> 
>>> request 2
>>> GET / HTTP/1.1
>>> Accept-Encoding: gzip, deflate
>>> Range: bytes=68,712,649-
>>> If-Match: "135e962713f.gz"
>>> If-Unmodified-Since: Tue, 06 Mar 2012 19:00:37 GMT
>>> 
>>> response 2
>>> 
>>> HTTP/1.1 206 Partial Content
>>> Accept-Ranges: bytes
>>> Content-Range: bytes 68,712,649-182,249,278/182,249,279
>>> Content-Encoding: gzip
>>> ETag: "135e962713f.gz"
>>> Last-Modified: Tue, 06 Mar 2012 19:00:37 GMT
>>> Content-Length: 113,536,630
>>> 
>>> Unfortunately the server has no idea that the range is for the
>>> uncompressed body. It returns the range of the gzip-ed body, which
>>> seems to be the best choice. Then firefox fails since it expects
>>> uncompressed body.
>>> 
>>> Is the server at fault here? Is there an understanding that Range is
>>> always for the "plain" body without any Content-Encoding?
>> 
>> The server is correct.  The UA would be broken.
>> 
>> Range is defined in terms of the entity-body (RFC2616) and the
>> representation body (p2, p5).  In both cases, the spec is clear
>> that Content-Encoding is part of that body, though we could add
>> more text to p5 to make that relationship clearer.
>> 
>> Transfer-Encoding is applied after the body.  That is, in fact,
>> the main reason Transfer-Encoding was defined -- C-E doesn't
>> work well for on-the-fly operations.  A UA cannot combine
>> on-the-fly decompression of C-E with range requests unless it
>> is retaining the original message in cache.
> 
> At least Firefox doesn't send "TE" header. Any idea how many UAs
> support response "Transfer-Encoding: gzip"?

Opera and a few command-line clients, that I know of.  It has
always been a chicken and egg problem to get T-E deployed.

> Another confusion:  if Content-Type=multipart/byteranges,
> Content-Encoding=gzip, what is gzip-ed exactly?  Is the message body
> 
>    gzip( multipart ( range ( plain_body ) ) )
> 
> or
> 
>    multipart ( range ( gzip (plain_body ) ) )
> 
> or something else?

The second one.

As in RFC2616 (I'd quote from p2, but we are just about to push
a new draft), ranges are applied to the entity-body that would be
sent in a normal GET, which in turn consists of:

7.2.1 Type

   When an entity-body is included with a message, the data type of that
   body is determined via the header fields Content-Type and Content-
   Encoding. These define a two-layer, ordered encoding model:

       entity-body := Content-Encoding( Content-Type( data ) )

   Content-Type specifies the media type of the underlying data.
   Content-Encoding may be used to indicate any additional content
   codings applied to the data, usually for the purpose of data
   compression, that are a property of the requested resource. There is
   no default encoding.

This is easier to describe in httpbis p2, right now, because
we separated entity into two distinct things: payload (what is in
a message) and representation (the content on which the message
payload is based).  The Range header field in p5 is still a bit
opaque on the topic.

....Roy
Received on Thursday, 4 October 2012 07:21:26 UTC