Re: UA support for Content-Disposition header (filename parameter) from Julian Reschke on 2008-03-18 (public-html@w3.org from March 2008)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Tue, 18 Mar 2008 17:58:24 +0100
To: Brian Smith <brian@briansmith.org>
CC: 'HTML WG' <public-html@w3.org>
Message-ID: <47DFF4B0.6080803@gmx.de>
Brian Smith wrote:
> Using Content-Disposition in HTTP is an ad-hoc solution; it isn't standardized anywhere. The IE encoding (percent-encoded UTF-8) is not locale-sensitive; in fact, RFC 2231-based encoding is more sensitive to locale because it allows arbitrary (non-Unicode) encodings.

But RFC2231 is part of Content-Disposition, see RFC2183, which requires 
RFC2184, which later was obsoleted by RFC2231.

Furthermore, the IE encoding *is* local-sensitive; if you send 
percent-encoded UTF-8 to a client that isn't configured for UTF-8 
encoded URIs, it doesn't work. At least it didn't when I had to deal 
with unhappy customers in Asia, and opened a support case.

Finally, using percent-escaped UTF-8 breaks all other clients that do 
not expect any kind of escaping in this place.

> Consider a filename that is 8 letters long, in Thai or any African or Asian language. The 2231-based encoding is something like this:
> 
> Content-Disposition: attachment;
>  filename*0==?UTF-8?Q?=1a=1b=1c=2a=2b=2c=3a=3b=3c=4a=4b=4c=5a=5b=5c=6a=6b=6c=7a=7b=7c=?=
>  filename*1==?UTF-8?Q?8a=8b=8c?=

No, it would be

  Content-Disposition: attachment; 
filename*=utf-8''%1A%1B%1C%2A%2B%2C%3A%3B%3C%4A%4B%4C%5A%5B%5C%6A%6B%6C%7A%7B%7C%8A%8B%8C

> Notice that the RFC 2231 encoding *requires* the header to be split into multiple lines (which many implementations do not handle well). Also notice that it requires two parameters "filename*1" and "filename*2" to be combined together to get the actual "filename" parameter. 

There is no requirement to fold long lines in HTTP headers, after all 
it's not MIME.

The right thing to do here would be to mandate just the encoding part of 
RFC2231; not the line splitting functionality.

> The Internet Explorer encoding is this:
> 
> Content-Disposition: attachment; filename="%1A%1B%1C%2A%2B%2C%3A%3B%3C%4A%4B%4C%5A%5B%5C%6A%6B%6C%7A%7B%7C%8A%8B%8C"
> 
> The header is more compact, the header can be kept on one line, there is no header-combining magic going on, and there is no need to deal with any encodings other than UTF-8.

- there is no need to wrap the filename under RFC2231 either, we're not 
using MIME

- furthermore, yes, a single encoding is good, so I would recommend to 
specify exactly that

- the example you give does not work in any browser except IE, and only 
if it is configured for UTF-8 encoded URIs (which was not the default 
setting around the world a few years ago).

> Also, consider this:
> 
> Content-Disposition: attachment;
>  filename*1==?UTF-8?Q?8a=8b=8c?=
>  filename*0==?UTF-8?Q?=1a=1b=1c=2a=2b=2c=3a=3b=3c=4a=4b=4c=5a=5b=5c=6a=6b=6c=7a=7b=7c=?=

That is RFC2047-style encoding mixed with RFC2231 line folding -- I 
didn't recommend that. It may even be illegal.

> This is valid according to RFC 2231 but Firefox and Thunderbird do *NOT* parse it correctly; they assume the parts of the filename are listed in order. So, there are no fully conforming HTTP+Content-Disposition+RFC2231 implementations.

That is probably true, thus it would make sense to specify the profile 
that UAs are expected to implement, and this is exactly the reason why I 
came here with this issue.

The profile would be:

- no line folding (continuations)
- use the encoding from <http://greenbytes.de/tech/webdav/rfc2231.html> 
with the encoding being hardwired to "utf-8".

>> Well, Microsoft hasn't implemented RFC2231. What makes you 
>> think that they would implement another RFC, when history 
>> tells that they just ignore it?
> 
> They already implemented the Internet Explorer mechanism in Internet Explorer. It doesn't work in all configurations.

See. How is this a solution when it works only for a subset of the IE 
installations?

> (Also, look at how unfair that both mechanisms are to users of non-Latin alphabets. It takes 72 bytes for the Internet Explorer encoding and 113 bytes for the RFC 2231 encoding, just to encode 8 letters in UTF-8.)

That's only true if you insist on line folding.

Otherwise the overhead is exactly 8 characters compared to what IE 
allows (not that the users would be really interested in that).

BR, Julian
Received on Tuesday, 18 March 2008 16:59:08 UTC