W3C home > Mailing lists > Public > public-html@w3.org > March 2008

RE: UA support for Content-Disposition header (filename parameter)

From: Brian Smith <brian@briansmith.org>
Date: Tue, 18 Mar 2008 09:01:24 -0700
To: "'HTML WG'" <public-html@w3.org>
Message-ID: <000701c88911$51d545c0$4001a8c0@T60>

Julian Reschke wrote:
> Brian Smith wrote:
> > The IE encoding is a lot better. In order to support 
> clients using it in requests, I have to be able to parse the 
> filename, and the IE syntax is much, much easier to parse 
> than the 2231-based syntax. Why not file a bug report against 
> IE so that it works all the time?
> 
> The IE encoding is an ad-hoc solution. It doesn't work 
> interoperably (it depends on IE's locale), while RFC2231 
> works out of the box.
> 
> Why would anybody (except Microsoft) want to standardize the 
> IE solution?

Using Content-Disposition in HTTP is an ad-hoc solution; it isn't standardized anywhere. The IE encoding (percent-encoded UTF-8) is not locale-sensitive; in fact, RFC 2231-based encoding is more sensitive to locale because it allows arbitrary (non-Unicode) encodings.

Consider a filename that is 8 letters long, in Thai or any African or Asian language. The 2231-based encoding is something like this:

Content-Disposition: attachment;
 filename*0==?UTF-8?Q?=1a=1b=1c=2a=2b=2c=3a=3b=3c=4a=4b=4c=5a=5b=5c=6a=6b=6c=7a=7b=7c=?=
 filename*1==?UTF-8?Q?8a=8b=8c?=

Notice that the RFC 2231 encoding *requires* the header to be split into multiple lines (which many implementations do not handle well). Also notice that it requires two parameters "filename*1" and "filename*2" to be combined together to get the actual "filename" parameter. 

The Internet Explorer encoding is this:

Content-Disposition: attachment; filename="%1A%1B%1C%2A%2B%2C%3A%3B%3C%4A%4B%4C%5A%5B%5C%6A%6B%6C%7A%7B%7C%8A%8B%8C"

The header is more compact, the header can be kept on one line, there is no header-combining magic going on, and there is no need to deal with any encodings other than UTF-8.

Also, consider this:

Content-Disposition: attachment;
 filename*1==?UTF-8?Q?8a=8b=8c?=
 filename*0==?UTF-8?Q?=1a=1b=1c=2a=2b=2c=3a=3b=3c=4a=4b=4c=5a=5b=5c=6a=6b=6c=7a=7b=7c=?=

This is valid according to RFC 2231 but Firefox and Thunderbird do *NOT* parse it correctly; they assume the parts of the filename are listed in order. So, there are no fully conforming HTTP+Content-Disposition+RFC2231 implementations.

> Well, Microsoft hasn't implemented RFC2231. What makes you 
> think that they would implement another RFC, when history 
> tells that they just ignore it?

They already implemented the Internet Explorer mechanism in Internet Explorer. It doesn't work in all configurations.

(Also, look at how unfair that both mechanisms are to users of non-Latin alphabets. It takes 72 bytes for the Internet Explorer encoding and 113 bytes for the RFC 2231 encoding, just to encode 8 letters in UTF-8.)

- Brian
Received on Tuesday, 18 March 2008 16:02:02 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:38:53 UTC