- From: Roy T. Fielding <fielding@gbiv.com>
- Date: Thu, 4 Oct 2012 00:21:06 -0700
- To: Zhong Yu <zhong.j.yu@gmail.com>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
On Oct 3, 2012, at 11:54 PM, Zhong Yu wrote: > On Thu, Oct 4, 2012 at 1:31 AM, Roy T. Fielding <fielding@gbiv.com> wrote: >> On Oct 3, 2012, at 10:04 PM, Zhong Yu wrote: >> >>> When a request contains a Range header, it specifies a (byte) range of >>> the representation body. However, the server doesn't know which >>> representation the client is talking about. >> >> The selected representation. >> >>> Here is an example of firefox failing to resume download a gzip-ed body: >>> >>> request 1 >>> >>> GET / HTTP/1.1 >>> Accept-Encoding: gzip, deflate >>> >>> response 1 >>> >>> HTTP/1.1 200 OK >>> Accept-Ranges: bytes >>> Content-Encoding: gzip >>> ETag: "135e962713f.gz" >>> Last-Modified: Tue, 06 Mar 2012 19:00:37 GMT >>> Content-Length: 182,249,279 >>> >>> Firefox decompress the body on the fly, and saves the decompressed >>> content to disk. >>> >>> Now pause the download, firefox has 68,712,649 bytes decompressed data on disk. >>> >>> Now resume the download, firefox tries to request range [68,712,649-] >>> of uncompressed body >> >> That's would be a bug in Firefox. Are you sure it does that? > > In which way this is a bug? How should Firefox behave? As I explained, it should be caching the original message and making range requests based on that -- not based on arbitrary decompressed disk files. >> Please tell me you just made up these examples -- there are no commas >> allowed in Content-Length and range specifiers. >> >>> request 2 >>> GET / HTTP/1.1 >>> Accept-Encoding: gzip, deflate >>> Range: bytes=68,712,649- >>> If-Match: "135e962713f.gz" >>> If-Unmodified-Since: Tue, 06 Mar 2012 19:00:37 GMT >>> >>> response 2 >>> >>> HTTP/1.1 206 Partial Content >>> Accept-Ranges: bytes >>> Content-Range: bytes 68,712,649-182,249,278/182,249,279 >>> Content-Encoding: gzip >>> ETag: "135e962713f.gz" >>> Last-Modified: Tue, 06 Mar 2012 19:00:37 GMT >>> Content-Length: 113,536,630 >>> >>> Unfortunately the server has no idea that the range is for the >>> uncompressed body. It returns the range of the gzip-ed body, which >>> seems to be the best choice. Then firefox fails since it expects >>> uncompressed body. >>> >>> Is the server at fault here? Is there an understanding that Range is >>> always for the "plain" body without any Content-Encoding? >> >> The server is correct. The UA would be broken. >> >> Range is defined in terms of the entity-body (RFC2616) and the >> representation body (p2, p5). In both cases, the spec is clear >> that Content-Encoding is part of that body, though we could add >> more text to p5 to make that relationship clearer. >> >> Transfer-Encoding is applied after the body. That is, in fact, >> the main reason Transfer-Encoding was defined -- C-E doesn't >> work well for on-the-fly operations. A UA cannot combine >> on-the-fly decompression of C-E with range requests unless it >> is retaining the original message in cache. > > At least Firefox doesn't send "TE" header. Any idea how many UAs > support response "Transfer-Encoding: gzip"? Opera and a few command-line clients, that I know of. It has always been a chicken and egg problem to get T-E deployed. > Another confusion: if Content-Type=multipart/byteranges, > Content-Encoding=gzip, what is gzip-ed exactly? Is the message body > > gzip( multipart ( range ( plain_body ) ) ) > > or > > multipart ( range ( gzip (plain_body ) ) ) > > or something else? The second one. As in RFC2616 (I'd quote from p2, but we are just about to push a new draft), ranges are applied to the entity-body that would be sent in a normal GET, which in turn consists of: 7.2.1 Type When an entity-body is included with a message, the data type of that body is determined via the header fields Content-Type and Content- Encoding. These define a two-layer, ordered encoding model: entity-body := Content-Encoding( Content-Type( data ) ) Content-Type specifies the media type of the underlying data. Content-Encoding may be used to indicate any additional content codings applied to the data, usually for the purpose of data compression, that are a property of the requested resource. There is no default encoding. This is easier to describe in httpbis p2, right now, because we separated entity into two distinct things: payload (what is in a message) and representation (the content on which the message payload is based). The Range header field in p5 is still a bit opaque on the topic. ....Roy
Received on Thursday, 4 October 2012 07:21:26 UTC