Re: Implementation Experience Content Encoding

Patrick McManus points out that the advice given in the
-08 draft of the HTTP/1.1 spec, for Accept-Encoding, is
(1) consistent with what many servers do, and (2)
inconsistent with what many browsers can handle.

(Note that I can't seem to find a copy of draft -08, so I'm
working from the text of the proposed solution from the Issues list.)

In particular, this Note

            Note: If the request does not include an Accept-Encoding
            field, and if the "identity" content-coding is unavailable,
            then preference should be given to content-codings commonly
            understood by HTTP/1.0 clients (i.e., "gzip" and
            "compress"); some older clients improperly display messages
            sent with other content-encodings.  The server may also make
            this decision based on information about the particular
            user-agent or client.

is misleading, because several important HTTP/1.0 clients apparently
do NOT understand gzip and compress.  We should probably fix this
implication; how about:

	    Note: If the request does not include an Accept-Encoding
	    field, and if the "identity" content-coding is unavailable,
	    then preference should be given to content-codings commonly
	    understood by some (but not all) HTTP/1.0 clients (i.e.,
	    "gzip" and "compress"); some older clients improperly
	    display messages sent with other content-encodings.  The
	    server may also make this decision based on information
	    about the particular user-agent or client.

However, there are really two different specification issues here:

(A) If the server has a choice of content-codings, including
"identity", and the client does not send an Accept-Encoding header,
and the server cannot figure things out from the User-Agent header,
should the spec insist that the server send "identity", or should
we leave it up to the good sense of the server?

(B) If the server does NOT have "identity" available, and it cannot
tell from the request what content-codings the client will accept
(i.e., no Accept-encoding and/or User-agent headers), what should
the server do?

For case (A), the draft says:
    If no Accept-Encoding field is present in a request, the server MAY
    assume that the client will accept any content coding.  In this
    case, if "identity" is one of the available content-codings, then
    the server SHOULD use the "identity" content-coding, unless
    it has additional information that a different content-coding
    is meaningful to the client.

I.e., we do insist that if "identity" is available, and the client
hasn't given solid information that it will accept something else,
a server SHOULD send "identity" (i.e., no Content-Encoding header).

Of course, we cannot make this apply retroactively to older
servers (especially HTTP/1.0 servers).  And we can't fix the
deployed base of browsers, either.  So, for the time being,
it's basically up to the good sense of the Web site maintainer
not to do anything too silly.

For case (B) (i.e., no "identity" content-coding is available, and the
client hasn't given solid information that it will accept something
else), we basically punt; we let the server do what it wants to do.
There was a long discussion of what would happen to existing "robot"
clients, which do not actually display or interpret what they
retrieve (e.g., automatic mirroring software). If we were to insist
that the server return an error in this case, then existing robot
software would break.

One could argue that it would be better to break robots than to
deliver garbage screenfuls to human users ... but that would
be a change to the status quo.

Bottom line:
(1) users ought to insist on browsers that don't display garbage
when they get a Content-coding that they don't understand, and
they ought to insist on browsers that send explicit Accept-encoding
headers when appropriate.

(2) Web site operators shouldn't send back compressed content-codings
unless either (a) the client explicitly asks for that, or (b) that's
the only thing available.  And Web site operators shouldn't let
themselves end up with "that's the only thing available" too often.

-Jeff

Received on Friday, 1 August 1997 15:49:30 UTC