- From: Nicolai Langfeldt <janl@ifi.uio.no>
- Date: Sun, 15 Sep 1996 00:56:25 +0200
- To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Dear working group, I am currently reading the HTTP/1.1 draft dated 12 aug. 1996 and I have some comments regarding content encoding usage. My interest in this is as implementor of w3mir and the associated w3http.pm (http://www.ifi.uio.no/~janl/w3mir.html). I am not on this mailing list, so comments meant for me must be directed to me, or Cc'ed to me. It is not unlikely that the issues I raise are merly figments of my imagination, but I have atempted to read the relevant parts of the spec several times without being further entligtened. * Firstly, the Accept-Encoding header. It is used for content negociation. It's absense implies that any Content-Encoding is acceptable to the client. There seems to be no way to specify that the client wishes only unencoded documents returned. To me this seems to be a desierable feature. Thus the value 'none' might be acceptable, and result in a non-encoded document to be sent. * Secondly about the usage of the Content-Encoding header. I have seen, in various places, that the correct Content-Encoding for a file named, index.html.gz should be 'gzip'. At first glance this is reasonable. But in a content negociation context it's confusing and results in needles complexety. --- Scenario 1: Client requests file intex.html.gz, this exists and is returned by the server, which reports that Content-Encoding is gzip. This behaviour is displayed by Apache 1.1.1 and Cern 3.0A (see below). --- Scenario 2: Client requests file index.html, but since the server owner is too cheap to buy enough disk and bandwidth all files on the server are gziped (or, more likely, bandwidht is sparse and cpu plentiful and the documents are gziped on-the-fly). So if allowed the server will return index.html gziped (index.html.gz). In this case the Content-Encoding should be 'gzip' because the server has in fact served the document requested, but encoded, it needs to be decoded by the client before the document can be presented to the user in the usual manner for a document of the given type. From my understanding of content negociation this is a correct response. --- The problem is that the scenarios are mutualy exclusive IMHO. One should be right and one should be wrong. I belive scenario 2 is/should be correct, and scenario 1 to be a useless and wrong application of the header. My reasoning: - If both scenarios are correct behaviours then the correct behaviour in the client depends on what scenario you are in. Then you need to be able to determine what scenario you are in. Determining what scenario you are in requires filename(extention) rules in the client. I find that this is undesierable and increases the client complexety to no real end (as I will atempt to illustrate now:) - If you are in scenario 1 and the client wants to save the file to disk it is faced with two correct alternatives of action: Saving it as index.html.gz encoded, or saving it as index.html decoded. The latter requires filename transformation rules to be associated with the decoding process. The former will reqire a client browsing a mirrored document hierarchy directly (using the file: method) to be able to do content negociation on it's own based on directory listings (when it looks for index.html it must find and correctly deocde index.html.gz). This seems like a unnecessary complexety increase, and one which will probably not be made. At the core of _my_ concern is the fact that, relatively simple clients, like w3mir and probably others will now need filename rules/mime types to be able to do it's work. Until now the Content-type: header in server replies and (forced) presense of the SGML tag '<!DOCTYPE HTML...>' in local copies of text/html docs has been all the file-type rules needed, that is simple and non-complex. - Scenario 1 is nonsensical because no encoding was added by the server over what the client should expect when asking for a .gz file. And when requesting the .gz file the server did in fact not do _any_ content encoding negociation, there was only one choice, which did not deviate from what the client asked for directly. To illustrate: if the document is index.ps.gz then Content-Type is text/html and Content-Encoding is gzip. Then the client might pipe the document though gzip and then into a html viewer to be viewed with no knowledge of file extentions in the http client, ... until it wants to save the document for the user, in which case it will either save it decoded or encoded as index.html.gz, only one is right. If it saves the doc encoded as index.html.gz it will do the Right Thing given scenario 1. But if it's in scenario 2 it will save it encoded as foo.ps. To always do the Right Thing requires filename extention -> mime mappings. And if the client has that it does not need to know the encoding in scenario 1, it can determine the encoding itself by the filename. - In scenario 2 what you should do is always very clear and simple: you decode the encoded doc before handling it as usual, perhaps saveing it to disk using the filename you specified to the server. No mime types, no filename transformations. Demonstration of present Content-Encoding usage: Given the runtime directive 'AddEncoding x-gzip gz' it will report a file named foo.gz to be of content-encoding x-gzip: $ telnet www.math.uio.no 80 ... HEAD /~janl/foo.gz HTTP/1.0 HTTP/1.0 200 OK Date: Sat, 14 Sep 1996 21:09:25 GMT Server: Apache/1.1.1 Content-type: text/plain Content-encoding: x-gzip Content-length: 0 Last-modified: Sat, 14 Sep 1996 21:08:01 GMT This usage is also present in w3c's own web server: $ telnet www.w3.org 80 ... HEAD /pub/WWW/Protocols/HTTP/1.1/diff-v11-06to07.ps.gz HTTP/1.0 HTTP/1.0 200 Document follows Server: CERN/3.0A Date: Sat, 14 Sep 1996 22:15:42 GMT Content-Encoding: gzip Content-Type: application/postscript Content-Length: 204470 Last-Modified: Thu, 05 Sep 1996 19:34:17 GMT Regards, Nicolai Langfeldt
Received on Saturday, 14 September 1996 16:13:32 UTC