- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Thu, 4 Oct 2012 00:21:06 -0700
- To: Zhong Yu <zhong.j.yu@gmail.com>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
On Oct 3, 2012, at 11:54 PM, Zhong Yu wrote:
> On Thu, Oct 4, 2012 at 1:31 AM, Roy T. Fielding <fielding@gbiv.com> wrote:
>> On Oct 3, 2012, at 10:04 PM, Zhong Yu wrote:
>>
>>> When a request contains a Range header, it specifies a (byte) range of
>>> the representation body. However, the server doesn't know which
>>> representation the client is talking about.
>>
>> The selected representation.
>>
>>> Here is an example of firefox failing to resume download a gzip-ed body:
>>>
>>> request 1
>>>
>>> GET / HTTP/1.1
>>> Accept-Encoding: gzip, deflate
>>>
>>> response 1
>>>
>>> HTTP/1.1 200 OK
>>> Accept-Ranges: bytes
>>> Content-Encoding: gzip
>>> ETag: "135e962713f.gz"
>>> Last-Modified: Tue, 06 Mar 2012 19:00:37 GMT
>>> Content-Length: 182,249,279
>>>
>>> Firefox decompress the body on the fly, and saves the decompressed
>>> content to disk.
>>>
>>> Now pause the download, firefox has 68,712,649 bytes decompressed data on disk.
>>>
>>> Now resume the download, firefox tries to request range [68,712,649-]
>>> of uncompressed body
>>
>> That's would be a bug in Firefox. Are you sure it does that?
>
> In which way this is a bug? How should Firefox behave?
As I explained, it should be caching the original message and
making range requests based on that -- not based on arbitrary
decompressed disk files.
>> Please tell me you just made up these examples -- there are no commas
>> allowed in Content-Length and range specifiers.
>>
>>> request 2
>>> GET / HTTP/1.1
>>> Accept-Encoding: gzip, deflate
>>> Range: bytes=68,712,649-
>>> If-Match: "135e962713f.gz"
>>> If-Unmodified-Since: Tue, 06 Mar 2012 19:00:37 GMT
>>>
>>> response 2
>>>
>>> HTTP/1.1 206 Partial Content
>>> Accept-Ranges: bytes
>>> Content-Range: bytes 68,712,649-182,249,278/182,249,279
>>> Content-Encoding: gzip
>>> ETag: "135e962713f.gz"
>>> Last-Modified: Tue, 06 Mar 2012 19:00:37 GMT
>>> Content-Length: 113,536,630
>>>
>>> Unfortunately the server has no idea that the range is for the
>>> uncompressed body. It returns the range of the gzip-ed body, which
>>> seems to be the best choice. Then firefox fails since it expects
>>> uncompressed body.
>>>
>>> Is the server at fault here? Is there an understanding that Range is
>>> always for the "plain" body without any Content-Encoding?
>>
>> The server is correct. The UA would be broken.
>>
>> Range is defined in terms of the entity-body (RFC2616) and the
>> representation body (p2, p5). In both cases, the spec is clear
>> that Content-Encoding is part of that body, though we could add
>> more text to p5 to make that relationship clearer.
>>
>> Transfer-Encoding is applied after the body. That is, in fact,
>> the main reason Transfer-Encoding was defined -- C-E doesn't
>> work well for on-the-fly operations. A UA cannot combine
>> on-the-fly decompression of C-E with range requests unless it
>> is retaining the original message in cache.
>
> At least Firefox doesn't send "TE" header. Any idea how many UAs
> support response "Transfer-Encoding: gzip"?
Opera and a few command-line clients, that I know of. It has
always been a chicken and egg problem to get T-E deployed.
> Another confusion: if Content-Type=multipart/byteranges,
> Content-Encoding=gzip, what is gzip-ed exactly? Is the message body
>
> gzip( multipart ( range ( plain_body ) ) )
>
> or
>
> multipart ( range ( gzip (plain_body ) ) )
>
> or something else?
The second one.
As in RFC2616 (I'd quote from p2, but we are just about to push
a new draft), ranges are applied to the entity-body that would be
sent in a normal GET, which in turn consists of:
7.2.1 Type
When an entity-body is included with a message, the data type of that
body is determined via the header fields Content-Type and Content-
Encoding. These define a two-layer, ordered encoding model:
entity-body := Content-Encoding( Content-Type( data ) )
Content-Type specifies the media type of the underlying data.
Content-Encoding may be used to indicate any additional content
codings applied to the data, usually for the purpose of data
compression, that are a property of the requested resource. There is
no default encoding.
This is easier to describe in httpbis p2, right now, because
we separated entity into two distinct things: payload (what is in
a message) and representation (the content on which the message
payload is based). The Range header field in p5 is still a bit
opaque on the topic.
....Roy
Received on Thursday, 4 October 2012 07:21:26 UTC