Re: Content encoding problem... from Henrik Frystyk Nielsen on 1997-02-19 (ietf-http-wg@w3.org from January to March 1997)

From: Henrik Frystyk Nielsen <frystyk@w3.org>
Date: Wed, 19 Feb 1997 14:42:19 -0500
To: Dave Kristol <dmk@bell-labs.com>, http-wg@cuckoo.hpl.hp.com
Message-Id: <3.0.1.32.19970219144219.00992250@pop.w3.org>
At 12:28 PM 2/19/97 -0500, Dave Kristol wrote:
>Roy T. Fielding wrote:
>> [...]
>> Then it sounds like they have digressed, because all of my tests were
>> with text/html content with both "x-gzip" and "x-compress" encoding.
>> The browsers would retrieve the content and decompress it before
>> rendering.  The only ones which did not do so were the Mac-based clients
>> which did not (at that time) have a library for gzip decompression.
>> This was two years ago (decades in web-years), but I am surprised that
>> things would change so much in an incompatible way.
>> [...]
>
>Roy's understanding matches mine.  I have to believe there's a
miscommunication
>here between people, because I believe the software continues to work this
way.
>If I'm right, then Roy is right that the only thing necessary to put
end-to-end
>compression on the wire is for servers just to do it:  If the user agent
sends an
>"Accept-Encoding: gzip" (for example), the server can gzip-compress the
content,
>on the fly if necessary, and add a "Content-Encoding: gzip" header in the
>response.  The original Content-Type would still apply (not necessarily
>application/octet-stream).

Sorry for the confusion - let me try and rephrase what I mean:

The current HTTP/1.1 spec introduces the "deflate" encoding as a possible
content-encoding. In order to advertize that the libwww HTTP/1.1 client
understands this particular encoding, I include "Accept-Encoding: deflate"
in the request. Jeff Mogul pointed out that the current wording in the spec
says that:

    If an Accept-Encoding header is present, and if the server cannot
    send a response which is acceptable according to the
    Accept-Encoding header, then the server SHOULD send an error
    response with the 406 (Not Acceptable) status code.

and that this will lead servers to send back a "406 Not Acceptable" return
code. The only way to avoid an extra RRT would be not to include the
accept-encoding header in which case it would probably never get used.

However, if there is _no_ "Accept-Encoding" then the spec allows HTTP/1.1
servers to generate any content encoding they like. I tried this and made
Jigsaw generate a response with content-type "text/html" and
content-encoding "deflate" and the result is a I have described that client
do _not_ handle it: the end user sees the encoded data parsed as HTML.

As Jeff also points out, the problem may also occur when going through
HTTP/1.0 proxies as they may serve cached objects with content-encodings
(otherwise ignored by the proxy) that the receiver doesn't understand.

The question is then how to avoid garbage ending up on the end-user's
screen without doing a trial-and-error wasting RTTs on content-encoding.
What I have advocated is to follow the same directions used for the
equivalent problem in content-types which says:

"Note: HTTP/1.1 servers are allowed to return responses which are not
acceptable according to the accept headers sent in the request. In some
cases, this may even be preferable to sending a 406 response. User agents
are encouraged to inspect the headers of an incoming response to determine
if it is acceptable. If the response could be unacceptable, a user agent
SHOULD temporarily stop receipt of more data and query the user for a
decision on further actions."

In order to avoid the problem with HTTP/1.0 clients, the solution would be
_NOT_ to send any other encodings (unless explicitly asked for) than gzip
and compress.

Thanks,

Henrik
--
Henrik Frystyk Nielsen, <frystyk@w3.org>
World Wide Web Consortium, MIT/LCS NE43-346
545 Technology Square, Cambridge MA 02139, USA
Received on Wednesday, 19 February 1997 11:50:58 UTC