- From: Jeffrey Mogul <mogul@pa.dec.com>
- Date: Wed, 19 Feb 97 16:34:02 PST
- To: jg@zorch.w3.org
- Cc: http-wg@cuckoo.hpl.hp.com
There seems to be some confusion surrounding the issue of end-to-end
data compression, which was probably partly created because Jim
forwarded a response to a message I sent, and the response didn't quote
my entire original message. I'll try to clarify what the problems
are. [Warning: long message follows.]
Jim and Henrik have argued that there is compelling evidence that
end-to-end data compression is sufficiently useful that we should not
wait for HTTP/2.0. I believe that there is also a good case to be made;
I'd prefer to discuss this offline with anyone who doesn't buy
Jim and Henrik's argument.
I also believe that HTTP/1.1 can offer exactly what is needed, once a
few *minor* problems are resolved.
First, for concreteness, suppose that we discover that a new
compression algorithm, say "zipflate", is better than either gzip or
compress.
There are three problems with RFC2068 that would prevent the most
efficient use of this compression algorithm, and that might result
in presenting users with bogus results:
(1) The current specification of Accept-encoding *requires*
(SHOULD-level, not MUST-level) a server to return an
error response in a situation where this is probably
not optimal. This might lead to many extra round-trips,
and might also lead to the destruction of otherwise
useful proxy-cache entries.
(2) The current specification of Accept-encoding *allows*
a server to send a response using an encoding that the
client software might not only not understand, but which
it might improperly render to an unwitting user.
(3) The current design allows an HTTP/1.0 cache to return
an encoded response to an HTTP/1.0 client, in such a way
as to cause the client to render garbage to an unwitting user.
The first two problems can be solved by a change to the specification
of Accept-Encoding. The last problem can be solved by introducing a
new status code, analogous to the one used for Partial-content (e.g.,
byte-range) responses. I'll elaborate below on each of these points.
The current wording in section 14.3 (Accept-Encoding) says:
If no Accept-Encoding header is present in a request, the server MAY
assume that the client will accept any content coding. If an Accept-
Encoding header is present, and if the server cannot send a response
which is acceptable according to the Accept-Encoding header, then the
server SHOULD send an error response with the 406 (Not Acceptable)
status code.
Here are some scenarios where this specification causes trouble:
Scenario #1:
HTTP/1.0 Client sends no Accept-Encoding header.
Server sends
Content-encoding: zipflate
Content-type: text/html
Client renders garbage
Henrik's experiments apparently confirm that.
Scenario #2:
HTTP/1.1 Client sends
Accept-encoding: zipflate
HTTP/1.1 Server without support for zipflate sends
HTTP/1.1 406 Not Acceptable
In this case, it would almost certainly be more efficient for
the server to simply send the unencoded (identity) response,
if this is available, rather than forcing the client to try
again. (See below for a proposal that allows the client to
explicitly say "send me nothing if you can't send me what I want".)
Jim's proposal is:
If an Accept-Encoding header is present, and if the server cannot
send a response which is acceptable according to the
Accept-Encoding header, then the server SHOULD send a response
using the default (identity) encoding; if the identity encoding
is not available, then the server SHOULD send an error response
with the 406 (Not Acceptable) status code.
That solves the problem with scenario #2, but not with scenario #1.
I have three different proposals to solve these two problems,
in order of increasing distance from current practice (and in
order of increasing precision, I think).
The simplest change would be to say:
If no Accept-Encoding header is present in the request, then
the server SHOULD respond using one of
o the default (identity) content-coding; or
o the "compress" content-coding; or
o the "gzip" content-coding
It MUST not respond using any other content-coding. If none
of these content-codings is available, the server SHOULD send
an error response with the 406 (Not Acceptable) status code.
Note: the use of unsolicited compressed encodings may
lead to confusing errors in rendering the response, and
should be done with caution.
If an Accept-encoding header is present, and if the server cannot
send a response which is acceptable according to the
Accept-Encoding header, then the server SHOULD send a response
using the default (identity) content-coding; it MUST NOT send a
non-identity content-coding not listed in the Accept-encoding
header. If, in this case, the identity content-coding is not
available, then the server SHOULD send an error response with the
406 (Not Acceptable) status code.
Actually, because the HTTP/1.1 spec does not explicitly require
a client to support any of the non-identity content-codings, it
seems smarter to use something like the following wording instead:
If no Accept-Encoding header is present in the request, then
the server SHOULD respond using the default (identity) content-coding.
It MUST not respond using any other content-coding. If none
of these content-codings is available, the server SHOULD send
an error response with the 406 (Not Acceptable) status code.
If an Accept-encoding header is present, and if the server cannot
send a response which is acceptable according to the
Accept-Encoding header, then the server SHOULD send a response
using the default (identity) content-coding; it MUST NOT send a
non-identity content-coding not listed in the Accept-encoding
header. If, in this case, the identity content-coding is not
available, then the server SHOULD send an error response with the
406 (Not Acceptable) status code.
And, if we want to make it possible for a client to say "send me
a compressed encoding or send me nothing", then I'd propose this
pair of changes
(1) in section 3.5 (Content Codings), add this after the item
for "deflate"
identity The default (identity) encoding; the use
of no transformation whatsoever. This
content-coding is used only in the
Accept-encoding header, and SHOULD NOT
be used in Content-coding header.
An HTTP/1.1 client or server MAY support any of these
content-codings, but SHOULD NOT assume (without explicit
evidence) that any other client or server supports any
content-coding besides "identity".
(2) the wording in section 14.3 would become:
If no Accept-Encoding header is present in the request, then
the server SHOULD respond using the default (identity) content-coding.
It MUST not respond using any other content-coding. If none
of these content-codings is available, the server SHOULD send
an error response with the 406 (Not Acceptable) status code.
If an Accept-encoding header is present, and if the server cannot
send a response which is acceptable according to the
Accept-Encoding header, then the server SHOULD send an error
response with the 406 (Not Acceptable) status code.
Note: a client willing to accept either a compressed
or uncompressed response should send, for example,
Accept-encoding: identity,gzip
to allow the server to generate a response without
wasting a round-trip.
This should solve problems #1 and #2.
====================
Now, on to problem #3.
Suppose one has this configuration:
|--- HTTP/1.1 client A
|
HTTP/1.1 server S ---- HTTP/1.0 proxy P ----
with cache |
|--- HTTP/1.0 client B
Now suppose that client A does
GET http://S/foo.html HTTP/1.1
Host: S
Accept-Encoding: zipflate
via proxy P, which forwards it to server S, which responds with
HTTP/1.1 200 OK
Content-Encoding: zipflate
Content-type: text/html
Last-Modifed: .....
Expires: .....
Cache-control: .....
Proxy P caches this response and forwards it to client A. So far,
so good.
Soon thereafter (before the Expires time), client B decides to issue its
own request for the same URL:
GET http://S/foo.html HTTP/1.0
Since HTTP/1.0 proxy P doesn't understand "Accept-Encoding", as far as
I can tell, it's likely to return the cached response to B. But client
B's HTTP/1.0 browser won't know how to render it. If that software
is smart, it might re-issue the request with a "Pragma: no-cache".
But I doubt that any existing browsers are this smart, with the
result that B's user (e.g., my mom) would be faced with a mysterious
error message (or a screen full of garbage).
I suppose one could hope that HTTP/1.0 caches don't store responses
with a Content-encoding header, but I looked at the sources for
the CERN httpd, and it doesn't seem to pay any attention.
The HTTP/1.0 "specification" defines the "gzip" and "compress"
content-codings, but does not define "deflate", so it is reasonable
to assume that many (if not all) HTTP/1.0 clients and proxies do
not understand the full set of content-codings already specified
in HTTP/1.1, let alone anything new that comes along.
So I suspect that we will need to fix HTTP/1.1 to make it safe
to use "default", or other new content-codings, with HTTP/1.0
caches before any widespread deployment of these compression
algorithms could be contemplated.
I propose that we add a new status code, analogous to 206 (Partial
Content), to be used on all HTTP/1.1 responses with a non-identity
Content-coding. For example, 207 (Encoded Content). This would allow
HTTP/1.0 caches to forward, but not to cache, the response; it would
allow HTTP/1.1 implementations to do whatever is appropriate. (I.e.,
an HTTP/1.1 cache would have to check the Content-Encoding against the
Accept-Encoding of a subsequent request.)
Here's some proposed wording:
10.2.8 207 Encoded Content
The server has used a non-identity content-coding for the response.
The request SHOULD (MUST?) have included an Accept-encoding field
including the name of content-coding used. The response MUST include
a Content-coding header specifying the content-coding used.
A cache that does not support the Accept-encoding header
MUST NOT cache a 207 (Encoded Content) response, except if the
cache is able to convert it to the identity content-coding before
using it in response to a subsequent request, and then only if
the response does not contain the "no-transform" Cache-control
directive.
This would prevent some HTTP/1.0 caches from storing "gzip"ed
or "compressed" results, but it's not clear that there is much
of this happening today. (Does anyone have proxy statistics that
show the fraction of cachable responses that have Content-coding
headers?)
-Jeff
Received on Wednesday, 19 February 1997 16:46:21 UTC