Re: Ambiguity in the Range header

On Thu, Oct 4, 2012 at 2:21 AM, Roy T. Fielding <fielding@gbiv.com> wrote:
> On Oct 3, 2012, at 11:54 PM, Zhong Yu wrote:
>> On Thu, Oct 4, 2012 at 1:31 AM, Roy T. Fielding <fielding@gbiv.com> wrote:
>>> On Oct 3, 2012, at 10:04 PM, Zhong Yu wrote:
>>>
>>>> When a request contains a Range header, it specifies a (byte) range of
>>>> the representation body. However, the server doesn't know which
>>>> representation the client is talking about.
>>>
>>> The selected representation.
>>>
>>>> Here is an example of firefox failing to resume download a gzip-ed body:
>>>>
>>>> request 1
>>>>
>>>> GET / HTTP/1.1
>>>> Accept-Encoding: gzip, deflate
>>>>
>>>> response 1
>>>>
>>>> HTTP/1.1 200 OK
>>>> Accept-Ranges: bytes
>>>> Content-Encoding: gzip
>>>> ETag: "135e962713f.gz"
>>>> Last-Modified: Tue, 06 Mar 2012 19:00:37 GMT
>>>> Content-Length: 182,249,279
>>>>
>>>> Firefox decompress the body on the fly, and saves the decompressed
>>>> content to disk.
>>>>
>>>> Now pause the download, firefox has 68,712,649 bytes decompressed data on disk.
>>>>
>>>> Now resume the download, firefox tries to request range [68,712,649-]
>>>> of uncompressed body
>>>
>>> That's would be a bug in Firefox.  Are you sure it does that?
>>
>> In which way this is a bug? How should Firefox behave?
>
> As I explained, it should be caching the original message and
> making range requests based on that -- not based on arbitrary
> decompressed disk files.
>
>>> Please tell me you just made up these examples -- there are no commas
>>> allowed in Content-Length and range specifiers.
>>>
>>>> request 2
>>>> GET / HTTP/1.1
>>>> Accept-Encoding: gzip, deflate
>>>> Range: bytes=68,712,649-
>>>> If-Match: "135e962713f.gz"
>>>> If-Unmodified-Since: Tue, 06 Mar 2012 19:00:37 GMT
>>>>
>>>> response 2
>>>>
>>>> HTTP/1.1 206 Partial Content
>>>> Accept-Ranges: bytes
>>>> Content-Range: bytes 68,712,649-182,249,278/182,249,279
>>>> Content-Encoding: gzip
>>>> ETag: "135e962713f.gz"
>>>> Last-Modified: Tue, 06 Mar 2012 19:00:37 GMT
>>>> Content-Length: 113,536,630
>>>>
>>>> Unfortunately the server has no idea that the range is for the
>>>> uncompressed body. It returns the range of the gzip-ed body, which
>>>> seems to be the best choice. Then firefox fails since it expects
>>>> uncompressed body.
>>>>
>>>> Is the server at fault here? Is there an understanding that Range is
>>>> always for the "plain" body without any Content-Encoding?
>>>
>>> The server is correct.  The UA would be broken.
>>>
>>> Range is defined in terms of the entity-body (RFC2616) and the
>>> representation body (p2, p5).  In both cases, the spec is clear
>>> that Content-Encoding is part of that body, though we could add
>>> more text to p5 to make that relationship clearer.
>>>
>>> Transfer-Encoding is applied after the body.  That is, in fact,
>>> the main reason Transfer-Encoding was defined -- C-E doesn't
>>> work well for on-the-fly operations.  A UA cannot combine
>>> on-the-fly decompression of C-E with range requests unless it
>>> is retaining the original message in cache.
>>
>> At least Firefox doesn't send "TE" header. Any idea how many UAs
>> support response "Transfer-Encoding: gzip"?
>
> Opera and a few command-line clients, that I know of.  It has
> always been a chicken and egg problem to get T-E deployed.
>
>> Another confusion:  if Content-Type=multipart/byteranges,
>> Content-Encoding=gzip, what is gzip-ed exactly?  Is the message body
>>
>>    gzip( multipart ( range ( plain_body ) ) )
>>
>> or
>>
>>    multipart ( range ( gzip (plain_body ) ) )
>>
>> or something else?
>
> The second one.
>
> As in RFC2616 (I'd quote from p2, but we are just about to push
> a new draft), ranges are applied to the entity-body that would be
> sent in a normal GET, which in turn consists of:
>
> 7.2.1 Type
>
>    When an entity-body is included with a message, the data type of that
>    body is determined via the header fields Content-Type and Content-
>    Encoding. These define a two-layer, ordered encoding model:
>
>        entity-body := Content-Encoding( Content-Type( data ) )

But 206 and Content-Type=multipart/byteranges violates this pattern;
that should be specially noted in httpbis p2 section 3.

>
>    Content-Type specifies the media type of the underlying data.
>    Content-Encoding may be used to indicate any additional content
>    codings applied to the data, usually for the purpose of data
>    compression, that are a property of the requested resource. There is
>    no default encoding.
>
> This is easier to describe in httpbis p2, right now, because
> we separated entity into two distinct things: payload (what is in
> a message) and representation (the content on which the message
> payload is based).  The Range header field in p5 is still a bit
> opaque on the topic.
>
> ....Roy
>

Received on Friday, 5 October 2012 16:47:26 UTC