Content-Encoding confusion

There are a lot of problems with the gzip'ed files downloaded.

If I request:

	http://www.w3.org/TR/html4/html40.pdf

the server is replies:

	Content-Encoding: gzip
	Content-Length: 962329
	Content-Location: html40.pdf.gz
	Content-Type: application/pdf; qs=0.001

Now, when I ask for:

	http://www.w3.org/TR/html4/html40.pdf.gz

the server is going to reply:

	Content-Encoding: gzip
	Content-Length: 962329
	Content-Type: application/pdf; qs=0.001

This is the configuration on www.w3.org, and I believe that it is
Apache's default behavior.

When I request html40.pdf, it is obvious that my browser needs to decode
(i.e. uncompress) the file on the fly, and save it under html40.pdf.

What about html.pdf.gz? The HTTP headers are the same, so I guess that
it's why browsers usually uncompress the file. And since they asked for
html40.pdf.gz, they save it under this name (there's no evidence telling
them to get rid of the .gz extension). Under Windows, you get a PDF file
with a .gz extension, which PDF readers don't like.

I think that the reply to http://www.w3.org/TR/html4/html40.pdf.gz
should not specify any content encoding and the content type should be
set to application/x-gzip or something similar, and that application/pdf
should only be used when the resource is negotiated.

RFC2616 says (section 7.2.1):

   Content-Type specifies the media type of the underlying data.
   Content-Encoding may be used to indicate any additional content
   codings applied to the data, usually for the purpose of data
   compression, that are a property of the requested resource. There is
   no default encoding.

My understanding of that is it is encoding of the transfer, not of the
data.

What is the correct behavior (of both clients and servers)?

-- 
Hugo Haas, Webmaster, Systems Team - W3C/MIT
mailto:hugo@w3.org - tel:+1-617-452-2092

Received on Tuesday, 2 May 2000 15:56:59 UTC