Content encoding problem...

In our recent performance work, (implementing and seeing the results of
deflate compression, which, btw, make a significant performance difference),
we came over a problem with the specification of Content-Encoding and
Accept-Encoding.

Here is some mail I filed on the topic, to prime general discussion
of how to fix this problem.

Our performance work makes it pretty clear we should straighten this
out somehow, as it can really help low bandwidth users significantly (and
nothing else other than style sheets does as much).  Our tests showed
that the deflate side is very very fast, and it would be a good optimiztion
if HTML documents were routinely sent in compressed form.  (We'll try
to get a number on how much better than modem compression it is soon;
it does save significantly even on regular lines, both in packets, and
total execution time.

See: http://www.w3.org/pub/WWW/Protocols/HTTP/Performance/Pipeline.html
for details.

After some discussion, I expect we should put together an ID summarizing
the problem and proposed solution.
			- Jim

------- Forwarded Messages

Date:  Mon, 10 Feb 1997 16:06:57 -0500
To:  Jeffrey Mogul <mogul@pa.dec.com>, jg@zorch.w3.org, abaird@w3.org,
            eric@w3.org, howcome@w3.org, chris@w3.org, fielding@liege.ICS.UCI.E
DU
From:  Henrik Frystyk Nielsen <frystyk@w3.org>
Subject:  Re: Network Performance Effects of HTTP/1.1, CSS1, and PNG 
Cc:  mogul@pa.dec.com

At 12:10 PM 2/10/97 PST, Jeffrey Mogul wrote:

>One way out of this would be to change that SHOULD to a "MAY",
>although I'd suggest thinking about changing the whole sentence
>to read something like:
>    If an Accept-Encoding header is present, and if the server cannot
>    send a response which is acceptable according to the
>    Accept-Encoding header, then the server SHOULD send a response
>    using the default (identity) encoding.
>
>I think this is probably the simplest solution.

Yes - I was thinking about this as well, and I definitely like your new
formulation better. The current fomulation is overly restrictive and does
not fit well with the fact that a server can send any content-encoding
without any indication of acceptance from the client (which is more a
backwards compatibility issue covering servers like the CERN server sending
"Content-Encoding: gzip")

>Since HTTP/1.0 proxy P doesn't understand "Accept-Encoding", as far as
>I can tell, it's likely to return the cached response to B.  But
>B's HTTP/1.0 browser won't know how to render it.  If that software
>is smart, it might re-issue the request with a "Pragma: no-cache".
>But I doubt that any existing browsers are this smart, with the
>result that B's user (e.g., my mom) would be faced with a mysterious
>error message (or a screen full of garbage).

I just checked the usual suspects: MSIE seems to handle content-encoding
(by rejecting it without saying why), Netscape and lynx doesn't and
presents garbage. linemode as well, I must admit :-( As far as I recall,
the CERN proxy actually understands encoding and uses it in its format
negotiation but it doesn't deny sending a document because of it. The
situation you describe is already the case with "gzip" and "compress" even
though they are not normally used for on-the-fly compression. The only
reason browsers dump it to disk is that the content type is
"application/octet-stream".

>So I suspect that we will need to fix HTTP/1.1 to make it safe
>to use content-codings with HTTP/1.0 caches before any widespread
>deployment of compression could be contemplated.  Perhaps we
>need a new status code, e.g., "207 (Encoded Content)", analogous
>to 206 (Partial Content).  This would allow HTTP/1.0 caches to
>forward, but not to cache, the response; it would allow HTTP/1.1
>implementations to do whatever is appropriate.  (I.e., an HTTP/1.1
>cache would have to check the Content-Encoding against the
>Accept-Encoding of a subsequent request.)

What if we said that:

"HTTP/1.1 servers or proxies MUST not send any content-encodings other than
"gzip" and "compress" to a HTTP/1.0 client unless the client explicitly
accepts it using an "Accept-Encoding" header."

This adds a restriction to the current "a server MAY use any encoding" but
maybe worth it. The downside is that HTTP/1.1 proxies (and servers) must
support the respective "inflate" mechanism for every "deflate" mechanism it
handles.

Thanks

Henrik
--
Henrik Frystyk Nielsen, <frystyk@w3.org>
World Wide Web Consortium, MIT/LCS NE43-346
545 Technology Square, Cambridge MA 02139, USA

------- Message 2

To:  Jeffrey Mogul <mogul@pa.dec.com>
Cc:  jg@zorch.w3.org, frystyk@w3.org, abaird@w3.org, eric@w3.org,
            howcome@w3.org, chris@w3.org
Subject:  Re: Network Performance Effects of HTTP/1.1, CSS1, and PNG 
Date:  Mon, 10 Feb 1997 12:51:18 -0800
From:  "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU>

>In the HTTP/1.1 spec, the specification for Accept-Encoding (14.3)
>says:
>    If an Accept-Encoding header is present, and if the server cannot
>    send a response which is acceptable according to the
>    Accept-Encoding header, then the server SHOULD send an error
>    response with the 406 (Not Acceptable) status code.
>
>The implication of this statement is that a client probably
>should not be sending an Accept-Encoding header unless it is
>sure that the server supports it, because this will likely
>lead to 406 errors.    

Yikes, it looks like that was changed between draft 01 and 02.
Before that, it said

   If no Accept-Encoding field is present in a request, the server may
   assume that the client will accept any content coding. If an
   Accept-Encoding field is present, but contains an empty field
   value, then the user agent is refusing to accept any content coding.

Does anyone remember why it was changed?  It won't work given the
wording in RFC 2068.

.....Roy

------- Message 3
To:  jg@zorch.w3.org
Cc:  Jeffrey Mogul <mogul@pa.dec.com>, frystyk@w3.org, abaird@w3.org,
            eric@w3.org, howcome@w3.org, chris@w3.org
Subject:  Re: Network Performance Effects of HTTP/1.1, CSS1, and PNG 
Date:  Mon, 10 Feb 1997 13:11:45 -0800
From:  "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU>

>The best fix for caches is the obvious one: content coding is a hop-by-hop header
>that should be completely transparent to the next hop; if you store something in a cache
>compressed, you should still be obligated to uncompress it to provide it to the next
>hop unless the next hop is willing to accept it in the form indicated.

That is what Transfer-Encoding is for -- it is intended to be used for
hop-by-hop compression.  Content-Encoding is end-to-end because it screws
up integrity checks if it is removed by a hop.

.....Roy

------- Message 4

To:  "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU>
Cc:  Jeffrey Mogul    <mogul@pa.dec.com>, frystyk@w3.org, abaird@w3.org,
            eric@w3.org, howcome@w3.org, chris@w3.org, jg
Subject:  Re: Network Performance Effects of HTTP/1.1, CSS1, and PNG 
Date:  Mon, 10 Feb 97 16:35:12 -0500
From:  jg

Yeah, except the only transfer encoding defined is "chunked".  gzip, compress and
deflate are all defined as content codings, not transfer codings....
			- Jim

------- End of Forwarded Messages

Received on Friday, 14 February 1997 12:48:00 UTC