HTTP/1.1 draft 12 aug 1996 and content encodings from Nicolai Langfeldt on 1996-09-14 (ietf-http-wg@w3.org from July to September 1996)

From: Nicolai Langfeldt <janl@ifi.uio.no>
Date: Sun, 15 Sep 1996 00:56:25 +0200
To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <199609142256.AAA25793@ifi.uio.no>
Dear working group,

I am currently reading the HTTP/1.1 draft dated 12 aug. 1996 and I
have some comments regarding content encoding usage.  My interest in
this is as implementor of w3mir and the associated w3http.pm
(http://www.ifi.uio.no/~janl/w3mir.html).  I am not on this mailing
list, so comments meant for me must be directed to me, or Cc'ed to me.

It is not unlikely that the issues I raise are merly figments of my
imagination, but I have atempted to read the relevant parts of the
spec several times without being further entligtened.

* Firstly, the Accept-Encoding header.  It is used for content
negociation.  It's absense implies that any Content-Encoding is
acceptable to the client.  There seems to be no way to specify that
the client wishes only unencoded documents returned.  To me this seems
to be a desierable feature.  Thus the value 'none' might be
acceptable, and result in a non-encoded document to be sent.

* Secondly about the usage of the Content-Encoding header.  I have
seen, in various places, that the correct Content-Encoding for a file
named, index.html.gz should be 'gzip'.  At first glance this is
reasonable.  But in a content negociation context it's confusing and
results in needles complexety.

---

Scenario 1:

Client requests file intex.html.gz, this exists and is returned by the
server, which reports that Content-Encoding is gzip.  This behaviour
is displayed by Apache 1.1.1 and Cern 3.0A (see below).

---

Scenario 2:

Client requests file index.html, but since the server owner is too
cheap to buy enough disk and bandwidth all files on the server are
gziped (or, more likely, bandwidht is sparse and cpu plentiful and the
documents are gziped on-the-fly).  So if allowed the server will
return index.html gziped (index.html.gz).  In this case the
Content-Encoding should be 'gzip' because the server has in fact
served the document requested, but encoded, it needs to be decoded by
the client before the document can be presented to the user in the
usual manner for a document of the given type.  From my understanding
of content negociation this is a correct response.

---

The problem is that the scenarios are mutualy exclusive IMHO.  One
should be right and one should be wrong.  I belive scenario 2
is/should be correct, and scenario 1 to be a useless and wrong
application of the header.  My reasoning:

- If both scenarios are correct behaviours then the correct behaviour
  in the client depends on what scenario you are in.  Then you need to
  be able to determine what scenario you are in.  Determining what
  scenario you are in requires filename(extention) rules in the
  client.  I find that this is undesierable and increases the client
  complexety to no real end (as I will atempt to illustrate now:)

- If you are in scenario 1 and the client wants to save the file to
  disk it is faced with two correct alternatives of action:  
  Saving it as index.html.gz encoded, or saving it as index.html
  decoded.  The latter requires filename transformation rules to be
  associated with the decoding process.  The former will reqire a
  client browsing a mirrored document hierarchy directly (using the
  file: method) to be able to do content negociation on it's own based
  on directory listings (when it looks for index.html it must find
  and correctly deocde index.html.gz).  This seems like a unnecessary
  complexety increase, and one which will probably not be made.  At
  the core of _my_ concern is the fact that, relatively simple
  clients, like w3mir and probably others will now need filename
  rules/mime types to be able to do it's work.  Until now the
  Content-type: header in server replies and (forced) presense of the
  SGML tag '<!DOCTYPE HTML...>' in local copies of text/html docs has
  been all the file-type rules needed, that is simple and non-complex.

- Scenario 1 is nonsensical because no encoding was added by the
  server over what the client should expect when asking for a .gz
  file.  And when requesting the .gz file the server did in fact not
  do _any_ content encoding negociation, there was only one choice,
  which did not deviate from what the client asked for directly.
  To illustrate: if the document is index.ps.gz then Content-Type is
  text/html and Content-Encoding is gzip.  Then the client might pipe
  the document though gzip and then into a html viewer to be viewed
  with no knowledge of file extentions in the http client, ... until
  it wants to save the document for the user, in which case it will
  either save it decoded or encoded as index.html.gz, only one is
  right.  If it saves the doc encoded as index.html.gz it will do the
  Right Thing given scenario 1.  But if it's in scenario 2 it will
  save it encoded as foo.ps.  To always do the Right Thing requires
  filename extention -> mime mappings.  And if the client has that it
  does not need to know the encoding in scenario 1, it can determine
  the encoding itself by the filename.

- In scenario 2 what you should do is always very clear and simple:
  you decode the encoded doc before handling it as usual, perhaps
  saveing it to disk using the filename you specified to the server.
  No mime types, no filename transformations.

Demonstration of present Content-Encoding usage:

Given the runtime directive 'AddEncoding x-gzip gz' it will report a
file named foo.gz to be of content-encoding x-gzip:

$ telnet www.math.uio.no 80
...
HEAD /~janl/foo.gz HTTP/1.0

HTTP/1.0 200 OK
Date: Sat, 14 Sep 1996 21:09:25 GMT
Server: Apache/1.1.1
Content-type: text/plain
Content-encoding: x-gzip
Content-length: 0
Last-modified: Sat, 14 Sep 1996 21:08:01 GMT

This usage is also present in w3c's own web server:

$ telnet www.w3.org 80
...
HEAD /pub/WWW/Protocols/HTTP/1.1/diff-v11-06to07.ps.gz HTTP/1.0

HTTP/1.0 200 Document follows
Server: CERN/3.0A
Date: Sat, 14 Sep 1996 22:15:42 GMT
Content-Encoding: gzip
Content-Type: application/postscript
Content-Length: 204470
Last-Modified: Thu, 05 Sep 1996 19:34:17 GMT


Regards,
  Nicolai Langfeldt
Received on Saturday, 14 September 1996 16:13:32 UTC