Re: Content encoding problem... from Henrik Frystyk Nielsen on 1997-02-21 (ietf-http-wg@w3.org from January to March 1997)

From: Henrik Frystyk Nielsen <frystyk@w3.org>
Date: Thu, 20 Feb 1997 23:04:19 -0500
To: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU>, Jeffrey Mogul <mogul@pa.dec.com>
Cc: http-wg@cuckoo.hpl.hp.com
Message-Id: <3.0.1.32.19970220230419.009a5960@pop.w3.org>
At 06:56 PM 2/20/97 -0800, Roy T. Fielding wrote:

>Yes.  It does so because there may exist some external mechanism,
>such as a choice list provided in HTML, whereby an older user agent
>might GET on a particular URL known to the human user to be in a
>specific encoding, even though the user agent has no knowledge of
>that encoding.  Keep in mind that not all GET requests are done for
>the purpose of rendering on a browser, and therefore the protocol
>must not create artificial requirements that restrict the content
>of the payload of a response.

I agree with Roy that unknown content-encodings may not be a bad thing if
it is treated as an unknown content-type and save it to disk in the hope
that the receiver has other means of decoding it. Handling unknown
encodings gracefully (both transport and content) should be part of a solid
HTTP client implementation.

>>Since HTTP/1.0 proxy P doesn't understand "Accept-Encoding", as far as
>>I can tell, it's likely to return the cached response to B.  But client
>>B's HTTP/1.0 browser won't know how to render it.  If that software
>>is smart, it might re-issue the request with a "Pragma: no-cache".
>>But I doubt that any existing browsers are this smart, with the
>>result that B's user (e.g., my mom) would be faced with a mysterious
>>error message (or a screen full of garbage).

The problem is that HTTP/1.0 is already broken. HTTP/1.0 (for example the
CERN server) happily sends out content-encoding and doesn't check before
returning a cached object with any content-encoding.

>>I propose that we add a new status code, analogous to 206 (Partial
>>Content), to be used on all HTTP/1.1 responses with a non-identity
>>Content-coding.  For example, 207 (Encoded Content).  This would allow
>>HTTP/1.0 caches to forward, but not to cache, the response; it would
>>allow HTTP/1.1 implementations to do whatever is appropriate.  (I.e.,
>>an HTTP/1.1 cache would have to check the Content-Encoding against the
>>Accept-Encoding of a subsequent request.)
>
>No, I don't even consider that an option.  206 works because the user
>agent will only receive it if it asks for a Range, and therefore we
>know that it won't puke.  Besides, it breaks the distinction between
>the response status and the payload content, which would be extremely
>depressing for the future evolution of HTTP.

As there already are HTTP/1.0 messages flying around (and being cached)
with status code 200 and a "gzip" or "compress" content encoding then a
HTTP/1.1 proxy should be capable of changing the return-code when
forwarding an HTTP/1.0 message. This is a significant difference from
previous uses of proxies which I don't know if people are prepared to accept.

>I suggest the following instead:
>
>     If no Accept-Encoding field is present in a request, the server MAY
>     assume that the client will accept any content coding.  However, if
>     the response content is negotiated on the basis of Accept-Encoding,
>     then the origin server SHOULD select a representation without any
>     Content-Encoding if one is available; if all available
>     representations use a non-identity content-coding, then
>     preference should be given to those content-coding(s) commonly
>     understood by older user agents, or known to be understood by the
>     particular user agent that initiated the request.

In my mind this will carry the bug over from HTTP/1.0 and involves too much
heuristics. I would make a specific note about how HTTP/1.0 clients react
to encodings and make HTTP/1.1 specification independent of this by
treating it as you describe in the next section:

"If, in this case, the identity content-coding is not available, then the
server SHOULD send an error response with the 406 (Not Acceptable) status
code"

The HTTP/1.0 note can then contain the heuristics.

>     If an Accept-Encoding field is present, and if the server cannot
>     send a response which is acceptable according to the
>     Accept-Encoding field, then the server SHOULD send a response
>     using the default (identity) content-coding; it MUST NOT send a
>     non-identity content-coding not listed in the Accept-Encoding
>     field.  If, in this case, the identity content-coding is not
>     available, then the server SHOULD send an error response with the
>     406 (Not Acceptable) status code.

Thanks,

Henrik

--
Henrik Frystyk Nielsen, <frystyk@w3.org>
World Wide Web Consortium, MIT/LCS NE43-356
545 Technology Square, Cambridge MA 02139, USA
Received on Thursday, 20 February 1997 20:09:36 UTC