- From: Nicolai Langfeldt <janl@ifi.uio.no>
- Date: Sun, 15 Sep 1996 22:31:21 +0200
- To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Koen Holtman <koen@win.tue.nl> keyed: ... > An empty Accept-Encoding value indicates none are acceptable. ... You are right. The type of this paragraph is clearly too small for me :-) Larry Masinter <masinter@parc.xerox.com> keyed: ... > > * Secondly about the usage of the Content-Encoding header. I have > > seen, in various places, that the correct Content-Encoding for a file > > named, index.html.gz should be 'gzip'. At first glance this is > > reasonable. But in a content negociation context it's confusing and > > results in needles complexety. ... > Your two scenarios are not mutually exclusive. ... Are both scenarios correct/intended usage of the Content-encoding header? > If you have a file index.html.gz which is gzipped HTML, and you > deliver the data without transformation, then the result should be > labelled > > content-type: text/html > content-encoding: gzip > > no matter whether you're returning the data as a result of requesting > "index.html.gz" or "index.html". The correct behavior in the client is > to interpret the data returned according to the content- labelling of > the result. The decitions of a information presentation client like netscape is in this case easy. For a retrive and store client the correct decoding and parsing process is likewise easily determined. > If a client wants to save the results to a disk file, the client might > want to make up a convenient file name; however, the file name it > makes up need not look like the URL that was requested. Presumably, > though, "save to file" would save the contents AFTER it was unencoded. > IF you have local conventions that text/html files are saved with > ".html" at the end, THEN you might want to process the URL in order to > generate that as a sample file name. In the case of a automated retrive and store client we now have a _hard_ problem. I will atempt to explain. Automatic copying though http (as w3mir does) have some legitimate uses: - Fast private reading off disk. This needs to preserve html link integrety. The filenames and encodings used for this DO matter. - Copying of entity hierarchies from one server to another. Since servers use filenames to determine content and encoding of documents this DOES matter. - Priming cache servers. There is no problem with that in this context. It would be nice if clients made for this purpose is also served by the http specification, since I think this class of clients will be more and more important. Koen Holtman and Larry argue that knowledge of local filename conventions should be used to determine what the Right Thing is. This is a complicating strategy, and IMHO a mess. For a client like w3mir you can manage to keep things simple since they will only need special knowledge about html files, and if something is HTML can easily be determined since they start with <!DOCTYPE HTML...>, and if they don't we can edit them after retrival, before saving to disk. So I disagree with this, because it complicates things, and is useless. Consider the scenarios again: 1. > GET index.html.gz < Content-type: text/html < Content-encoding: gzip This is _only_ usefull for simple decoding in the client. But it requires knowledge of filename conventions if you want to determine if the server has done content negociation or not, and wether to save the document encoded or decoded to disk, in a automated manner. The only purpose of this functionality is to save disk-space and being able to put <a href="index.html.gz"> in your docs. But this usage essentialy means that content-encoding negociation (and associated headers) is unneeded, and a redundant part of the spec; given the existense of content-encoding negociation this is a useless usage, serving only to complicate clients. I will try to justify this claim in my discussion of scenario 2. In a majority of all cases a client will only request a .gz file when a gz file has been provided in the server namespace for fast transfer of postscript or other documents of nontrivial size, like the HTTP spec. I have _never_ seen this used for html (not that I surf particularly much). So, easy decoding in this scenario would seem to be a non-issue. Additionaly, the client _did_ ask for index.html.gz, and the server did _not_ apply any encoding the client did not ask for implicitly. In conclusion: Forbiding this use will _not_ break anything, and it will simplify some of the less complex clients. 2. > GET index.html < Content-type: text/html < Content-encoding: gzip This makes sense. Here the file index.html.gz might exist on disk and is served, or the site might have plenty of CPU and little bandwidth, and prefers to gzip documents if the client can handle it. This means that there is no need to refer to index.html.gz in scenario 1, because the server can mix and match encodings as needed based on CPU, diskspace, bandwidht or whatever other considerations needed. The only remaining reason to refer to .gz files is files like draft-ietf-http-v11-spec-07.ps.gz which, we in fact mostly want saved on disk or printed anyway, not decoded and shown on screen right away since they're so large and impractical to read on a screen. Here the server _did_ apply a encoding not asked for implicitly, and the Content-encoding _does_ make 100% sense. It does not complicate automatic retrive and save clients either, because there is no need for knowledge of local filename conventions. You just decode and save as the basename of the requested file. Furthermore; allowing both scenarios complicates things too, because determining if you are faced with scenario 1 or 2 requires knowledge of filename conventions and suitable heuristics. It is true, though, that this is the same knowledge needed to determine if it's correct to save index.html.gz encoded or decoded in scenario 1. So, to conclude, I think that: - Scenario 1 serves no purpose and requires higher complexety for correct decitions in automated retrive and store clients. - Scenario 2 is the Right Thing, and should be the only allowed scenario, Content-Encoding should _only_ be used when content-encoding negociation has been done. Regards, Nicolai Langfeldt
Received on Sunday, 15 September 1996 13:35:21 UTC