- From: Jeffrey Mogul <mogul@pa.dec.com>
- Date: Wed, 19 Feb 97 16:34:02 PST
- To: jg@zorch.w3.org
- Cc: http-wg@cuckoo.hpl.hp.com
There seems to be some confusion surrounding the issue of end-to-end data compression, which was probably partly created because Jim forwarded a response to a message I sent, and the response didn't quote my entire original message. I'll try to clarify what the problems are. [Warning: long message follows.] Jim and Henrik have argued that there is compelling evidence that end-to-end data compression is sufficiently useful that we should not wait for HTTP/2.0. I believe that there is also a good case to be made; I'd prefer to discuss this offline with anyone who doesn't buy Jim and Henrik's argument. I also believe that HTTP/1.1 can offer exactly what is needed, once a few *minor* problems are resolved. First, for concreteness, suppose that we discover that a new compression algorithm, say "zipflate", is better than either gzip or compress. There are three problems with RFC2068 that would prevent the most efficient use of this compression algorithm, and that might result in presenting users with bogus results: (1) The current specification of Accept-encoding *requires* (SHOULD-level, not MUST-level) a server to return an error response in a situation where this is probably not optimal. This might lead to many extra round-trips, and might also lead to the destruction of otherwise useful proxy-cache entries. (2) The current specification of Accept-encoding *allows* a server to send a response using an encoding that the client software might not only not understand, but which it might improperly render to an unwitting user. (3) The current design allows an HTTP/1.0 cache to return an encoded response to an HTTP/1.0 client, in such a way as to cause the client to render garbage to an unwitting user. The first two problems can be solved by a change to the specification of Accept-Encoding. The last problem can be solved by introducing a new status code, analogous to the one used for Partial-content (e.g., byte-range) responses. I'll elaborate below on each of these points. The current wording in section 14.3 (Accept-Encoding) says: If no Accept-Encoding header is present in a request, the server MAY assume that the client will accept any content coding. If an Accept- Encoding header is present, and if the server cannot send a response which is acceptable according to the Accept-Encoding header, then the server SHOULD send an error response with the 406 (Not Acceptable) status code. Here are some scenarios where this specification causes trouble: Scenario #1: HTTP/1.0 Client sends no Accept-Encoding header. Server sends Content-encoding: zipflate Content-type: text/html Client renders garbage Henrik's experiments apparently confirm that. Scenario #2: HTTP/1.1 Client sends Accept-encoding: zipflate HTTP/1.1 Server without support for zipflate sends HTTP/1.1 406 Not Acceptable In this case, it would almost certainly be more efficient for the server to simply send the unencoded (identity) response, if this is available, rather than forcing the client to try again. (See below for a proposal that allows the client to explicitly say "send me nothing if you can't send me what I want".) Jim's proposal is: If an Accept-Encoding header is present, and if the server cannot send a response which is acceptable according to the Accept-Encoding header, then the server SHOULD send a response using the default (identity) encoding; if the identity encoding is not available, then the server SHOULD send an error response with the 406 (Not Acceptable) status code. That solves the problem with scenario #2, but not with scenario #1. I have three different proposals to solve these two problems, in order of increasing distance from current practice (and in order of increasing precision, I think). The simplest change would be to say: If no Accept-Encoding header is present in the request, then the server SHOULD respond using one of o the default (identity) content-coding; or o the "compress" content-coding; or o the "gzip" content-coding It MUST not respond using any other content-coding. If none of these content-codings is available, the server SHOULD send an error response with the 406 (Not Acceptable) status code. Note: the use of unsolicited compressed encodings may lead to confusing errors in rendering the response, and should be done with caution. If an Accept-encoding header is present, and if the server cannot send a response which is acceptable according to the Accept-Encoding header, then the server SHOULD send a response using the default (identity) content-coding; it MUST NOT send a non-identity content-coding not listed in the Accept-encoding header. If, in this case, the identity content-coding is not available, then the server SHOULD send an error response with the 406 (Not Acceptable) status code. Actually, because the HTTP/1.1 spec does not explicitly require a client to support any of the non-identity content-codings, it seems smarter to use something like the following wording instead: If no Accept-Encoding header is present in the request, then the server SHOULD respond using the default (identity) content-coding. It MUST not respond using any other content-coding. If none of these content-codings is available, the server SHOULD send an error response with the 406 (Not Acceptable) status code. If an Accept-encoding header is present, and if the server cannot send a response which is acceptable according to the Accept-Encoding header, then the server SHOULD send a response using the default (identity) content-coding; it MUST NOT send a non-identity content-coding not listed in the Accept-encoding header. If, in this case, the identity content-coding is not available, then the server SHOULD send an error response with the 406 (Not Acceptable) status code. And, if we want to make it possible for a client to say "send me a compressed encoding or send me nothing", then I'd propose this pair of changes (1) in section 3.5 (Content Codings), add this after the item for "deflate" identity The default (identity) encoding; the use of no transformation whatsoever. This content-coding is used only in the Accept-encoding header, and SHOULD NOT be used in Content-coding header. An HTTP/1.1 client or server MAY support any of these content-codings, but SHOULD NOT assume (without explicit evidence) that any other client or server supports any content-coding besides "identity". (2) the wording in section 14.3 would become: If no Accept-Encoding header is present in the request, then the server SHOULD respond using the default (identity) content-coding. It MUST not respond using any other content-coding. If none of these content-codings is available, the server SHOULD send an error response with the 406 (Not Acceptable) status code. If an Accept-encoding header is present, and if the server cannot send a response which is acceptable according to the Accept-Encoding header, then the server SHOULD send an error response with the 406 (Not Acceptable) status code. Note: a client willing to accept either a compressed or uncompressed response should send, for example, Accept-encoding: identity,gzip to allow the server to generate a response without wasting a round-trip. This should solve problems #1 and #2. ==================== Now, on to problem #3. Suppose one has this configuration: |--- HTTP/1.1 client A | HTTP/1.1 server S ---- HTTP/1.0 proxy P ---- with cache | |--- HTTP/1.0 client B Now suppose that client A does GET http://S/foo.html HTTP/1.1 Host: S Accept-Encoding: zipflate via proxy P, which forwards it to server S, which responds with HTTP/1.1 200 OK Content-Encoding: zipflate Content-type: text/html Last-Modifed: ..... Expires: ..... Cache-control: ..... Proxy P caches this response and forwards it to client A. So far, so good. Soon thereafter (before the Expires time), client B decides to issue its own request for the same URL: GET http://S/foo.html HTTP/1.0 Since HTTP/1.0 proxy P doesn't understand "Accept-Encoding", as far as I can tell, it's likely to return the cached response to B. But client B's HTTP/1.0 browser won't know how to render it. If that software is smart, it might re-issue the request with a "Pragma: no-cache". But I doubt that any existing browsers are this smart, with the result that B's user (e.g., my mom) would be faced with a mysterious error message (or a screen full of garbage). I suppose one could hope that HTTP/1.0 caches don't store responses with a Content-encoding header, but I looked at the sources for the CERN httpd, and it doesn't seem to pay any attention. The HTTP/1.0 "specification" defines the "gzip" and "compress" content-codings, but does not define "deflate", so it is reasonable to assume that many (if not all) HTTP/1.0 clients and proxies do not understand the full set of content-codings already specified in HTTP/1.1, let alone anything new that comes along. So I suspect that we will need to fix HTTP/1.1 to make it safe to use "default", or other new content-codings, with HTTP/1.0 caches before any widespread deployment of these compression algorithms could be contemplated. I propose that we add a new status code, analogous to 206 (Partial Content), to be used on all HTTP/1.1 responses with a non-identity Content-coding. For example, 207 (Encoded Content). This would allow HTTP/1.0 caches to forward, but not to cache, the response; it would allow HTTP/1.1 implementations to do whatever is appropriate. (I.e., an HTTP/1.1 cache would have to check the Content-Encoding against the Accept-Encoding of a subsequent request.) Here's some proposed wording: 10.2.8 207 Encoded Content The server has used a non-identity content-coding for the response. The request SHOULD (MUST?) have included an Accept-encoding field including the name of content-coding used. The response MUST include a Content-coding header specifying the content-coding used. A cache that does not support the Accept-encoding header MUST NOT cache a 207 (Encoded Content) response, except if the cache is able to convert it to the identity content-coding before using it in response to a subsequent request, and then only if the response does not contain the "no-transform" Cache-control directive. This would prevent some HTTP/1.0 caches from storing "gzip"ed or "compressed" results, but it's not clear that there is much of this happening today. (Does anyone have proxy statistics that show the fraction of cachable responses that have Content-coding headers?) -Jeff
Received on Wednesday, 19 February 1997 16:46:21 UTC