Re: Content encoding problem... from Roy T. Fielding on 1997-02-21 (ietf-http-wg@w3.org from January to March 1997)

From: Roy T. Fielding <fielding@kiwi.ICS.UCI.EDU>
Date: Thu, 20 Feb 1997 18:56:27 -0800
To: Jeffrey Mogul <mogul@pa.dec.com>
Cc: http-wg@cuckoo.hpl.hp.com
Message-Id: <9702201856.aa16513@paris.ics.uci.edu>
I think I understand now what Henrik was describing, and I agreed that
the description of Accept-Encoding needs to be fixed.  However, not
all of it is broken.

Jeffrey Mogul writes:
>There are three problems with RFC2068 that would prevent the most
>efficient use of this compression algorithm, and that might result
>in presenting users with bogus results:
>
>	(1) The current specification of Accept-encoding *requires*
>	(SHOULD-level, not MUST-level) a server to return an
>	error response in a situation where this is probably
>	not optimal.  This might lead to many extra round-trips,
>	and might also lead to the destruction of otherwise
>	useful proxy-cache entries.

We should fix that in the spec.

>	(2) The current specification of Accept-encoding *allows*
>	a server to send a response using an encoding that the
>	client software might not only not understand, but which
>	it might improperly render to an unwitting user.

Yes.  It does so because there may exist some external mechanism,
such as a choice list provided in HTML, whereby an older user agent
might GET on a particular URL known to the human user to be in a
specific encoding, even though the user agent has no knowledge of
that encoding.  Keep in mind that not all GET requests are done for
the purpose of rendering on a browser, and therefore the protocol
must not create artificial requirements that restrict the content
of the payload of a response.

For example, I have a save-URL-to-file program called lwpget.  It never
sends Accept-Encoding, because there has never been any requirement
that it should, and no HTTP/1.0 server ever needed it.  Should lwpget
be prevented from working because of the *possibility* that it might
be a rendering engine that doesn't understand the encoding?

>	(3) The current design allows an HTTP/1.0 cache to return
>	an encoded response to an HTTP/1.0 client, in such a way
>	as to cause the client to render garbage to an unwitting user.

Yes.  It is the responsibility of the origin server to prevent this
from happening by accident.  It is not possible to prevent it from
happening on purpose, because attempting to do so breaks my (2).

>Jim's proposal is:
>
>    If an Accept-Encoding header is present, and if the server cannot
>    send a response which is acceptable according to the
>    Accept-Encoding header, then the server SHOULD send a response
>    using the default (identity) encoding; if the identity encoding
>    is not available, then the server SHOULD send an error response 
>    with the 406 (Not Acceptable) status code.
>
>That solves the problem with scenario #2, but not with scenario #1.

Yes, that works because it is prefixed with "If an Accept-Encoding
header is present ..."

>I have three different proposals to solve these two problems,
>in order of increasing distance from current practice (and in
>order of increasing precision, I think).
>
>The simplest change would be to say:
>
>    If no Accept-Encoding header is present in the request, then
>    the server SHOULD respond using one of
>	o the default (identity) content-coding; or
>	o the "compress" content-coding; or
>	o the "gzip" content-coding
>    It MUST not respond using any other content-coding.  If none
>    of these content-codings is available, the server SHOULD send
>    an error response with the 406 (Not Acceptable) status code.

That would break my application.

>	Note: the use of unsolicited compressed encodings may
>	lead to confusing errors in rendering the response, and
>	should be done with caution.
>
>    If an Accept-encoding header is present, and if the server cannot
>    send a response which is acceptable according to the
>    Accept-Encoding header, then the server SHOULD send a response
>    using the default (identity) content-coding; it MUST NOT send a
>    non-identity content-coding not listed in the Accept-encoding
>    header.  If, in this case, the identity content-coding is not
>    available, then the server SHOULD send an error response with the
>    406 (Not Acceptable) status code.
>
>Actually, because the HTTP/1.1 spec does not explicitly require
>a client to support any of the non-identity content-codings, it
>seems smarter to use something like the following wording instead:
>
>    If no Accept-Encoding header is present in the request, then
>    the server SHOULD respond using the default (identity) content-coding.
>    It MUST not respond using any other content-coding.  If none
>    of these content-codings is available, the server SHOULD send
>    an error response with the 406 (Not Acceptable) status code.

That would break my application.

>    If an Accept-encoding header is present, and if the server cannot
>    send a response which is acceptable according to the
>    Accept-Encoding header, then the server SHOULD send a response
>    using the default (identity) content-coding; it MUST NOT send a
>    non-identity content-coding not listed in the Accept-encoding
>    header.  If, in this case, the identity content-coding is not
>    available, then the server SHOULD send an error response with the
>    406 (Not Acceptable) status code.
>
>And, if we want to make it possible for a client to say "send me
>a compressed encoding or send me nothing", then I'd propose this
>pair of changes
>
>(1) in section 3.5 (Content Codings), add this after the item
>for "deflate"
>
>	identity	The default (identity) encoding; the use
>			of no transformation whatsoever.  This
>			content-coding is used only in the
>			Accept-encoding header, and SHOULD NOT
>			be used in Content-coding header.

That would be fine, though an Accept-Encoding with no value was
originally intended to mean "I only accept the identity encoding".

>====================
>
>Now, on to problem #3.
>
>Suppose one has this configuration:
>
>
>                                           |--- HTTP/1.1 client A
>                                           |
>HTTP/1.1 server S ---- HTTP/1.0 proxy P ----
>                              with cache   |
>                                           |--- HTTP/1.0 client B
>
>Now suppose that client A does
>	GET http://S/foo.html HTTP/1.1
>	Host: S
>	Accept-Encoding: zipflate
>
>via proxy P, which forwards it to server S, which responds with
>
>	HTTP/1.1 200 OK
>	Content-Encoding: zipflate
>	Content-type: text/html
>	Last-Modifed: .....
>	Expires: .....
>	Cache-control: .....

You forgot to add

        Vary: Accept-Encoding

Yes, I know that an HTTP/1.0 proxy cache will probably ignore it,
but there is only so much we can do without breaking the protocol.

>Proxy P caches this response and forwards it to client A.  So far,
>so good.
>
>Soon thereafter (before the Expires time), client B decides to issue its
>own request for the same URL:
>	GET http://S/foo.html HTTP/1.0
>
>Since HTTP/1.0 proxy P doesn't understand "Accept-Encoding", as far as
>I can tell, it's likely to return the cached response to B.  But client
>B's HTTP/1.0 browser won't know how to render it.  If that software
>is smart, it might re-issue the request with a "Pragma: no-cache".
>But I doubt that any existing browsers are this smart, with the
>result that B's user (e.g., my mom) would be faced with a mysterious
>error message (or a screen full of garbage).

The answer is: get a better proxy cache.  Seriously, there comes a point
when we must recognize the limitations of older technology and move on.
Barring fatal problems (this is not one of them), it is appropriate that
users of inadequate technology receive inadequate results.

Keep in mind, however, that this particular scenario only occurs if
the URL in question has negotiated responses based on Accept-Encoding.
It is quite reasonable for the origin server to modify its negotiation
algorithm based on the capabilities of the user agent, or even the
fact that it was passed through a particular cache; I even described
that in section 12.1.

>I suppose one could hope that HTTP/1.0 caches don't store responses
>with a Content-encoding header, but I looked at the sources for
>the CERN httpd, and it doesn't seem to pay any attention.
>
>The HTTP/1.0 "specification" defines the "gzip" and "compress"
>content-codings, but does not define "deflate", so it is reasonable
>to assume that many (if not all) HTTP/1.0 clients and proxies do
>not understand the full set of content-codings already specified
>in HTTP/1.1, let alone anything new that comes along.

No, that is not a reasonable assumption.  While browsers may not understand
those encodings for the purpose of in-line rendering, not all browser
requests are for the purpose of rendering, and not all clients are browsers.

>I propose that we add a new status code, analogous to 206 (Partial
>Content), to be used on all HTTP/1.1 responses with a non-identity
>Content-coding.  For example, 207 (Encoded Content).  This would allow
>HTTP/1.0 caches to forward, but not to cache, the response; it would
>allow HTTP/1.1 implementations to do whatever is appropriate.  (I.e.,
>an HTTP/1.1 cache would have to check the Content-Encoding against the
>Accept-Encoding of a subsequent request.)

No, I don't even consider that an option.  206 works because the user
agent will only receive it if it asks for a Range, and therefore we
know that it won't puke.  Besides, it breaks the distinction between
the response status and the payload content, which would be extremely
depressing for the future evolution of HTTP.

I suggest the following instead:

     If no Accept-Encoding field is present in a request, the server MAY
     assume that the client will accept any content coding.  However, if
     the response content is negotiated on the basis of Accept-Encoding,
     then the origin server SHOULD select a representation without any
     Content-Encoding if one is available; if all available
     representations use a non-identity content-coding, then
     preference should be given to those content-coding(s) commonly
     understood by older user agents, or known to be understood by the
     particular user agent that initiated the request.

     If an Accept-Encoding field is present, and if the server cannot
     send a response which is acceptable according to the
     Accept-Encoding field, then the server SHOULD send a response
     using the default (identity) content-coding; it MUST NOT send a
     non-identity content-coding not listed in the Accept-Encoding
     field.  If, in this case, the identity content-coding is not
     available, then the server SHOULD send an error response with the
     406 (Not Acceptable) status code.
 
I think that will accomplish the desired effect without preventing
existing applications (mirrors, Save As dialogs, etc.) from working.

Cheers,

 ...Roy T. Fielding
    Department of Information & Computer Science    (fielding@ics.uci.edu)
    University of California, Irvine, CA 92697-3425    fax:+1(714)824-4056
    http://www.ics.uci.edu/~fielding/
Received on Thursday, 20 February 1997 19:07:53 UTC