RE: Problem in downloading a pdf file having Japanese characters in the name of the file from Jungshik Shin on 2003-10-31 (www-international@w3.org from October to December 2003)

From: Jungshik Shin <jshin@i18nl10n.com>
Date: Fri, 31 Oct 2003 22:34:10 +0900 (KST)
To: Steve Billings <billings@global360.com>
Cc: "souravm (by way of Martin Duerst <duerst@w3.org>)" <souravm@infosys.com>, www-international@w3.org
Message-ID: <Pine.LNX.4.58.0310312222590.29249@jshin.net>

On Thu, 30 Oct 2003, Steve Billings wrote:

> I wrestled with this problem earlier this year, and unfortunately found no
> good solutions. As far as I can tell (and I hope someone can prove me
> wrong), it's a yet-to-be solved problem in the internet infrastructure. I
> was using recent versions of IE and Netscape browsers, and a not-so-new
> version of Tomcat (3.something, I think).
>
> The approach that came closest to working was to encode the filename using
> URLEncoder
> (http://java.sun.com/j2se/1.4.1/docs/api/java/net/URLEncoder.html) with
> UTF-8, and set the Content-Disposition according to RFC 2047 as follows:

  You're not supposed to use RFC 2047 encoding for _parameters_
(such as 'filename' in Content-Disposition header)  of header
field. It's RFC 2231 that has to be used.  It's regrettable that
this fact is buried deep inside RFC 822/STD 11, RFC 2047, RFC 2184
and RFC 2231.

> With this approach, if the Japanese filename is short, when you save the
> file from the browser, everything looks fine. If you open it without saving
> it, Notepad gets the encoded name (bad). Another problem is that this
> approach can only handle filenames up to about 17 Japanese characters.
>
> I tried using other standards (RFC 2184, RFC 2231) with no success.

  Mozilla 1.5 or later does support RFC 2231 (see
<http://bugzilla.mozilla.org/show_bug.cgi?id=162765>
and <http://i18nl10n.com/moztest/download.html>)
It's unfortunate that MS IE does not understand RFC 2231 used in
Content-Disposition header of HTTP. As a fallback, Mozilla also accepts
RFC 2047 'raw' UTF-8 and 'raw' non-ASCII string in the same character
encoding as that of the 'containing' document.

 Jungshik

Received on Friday, 31 October 2003 08:34:18 UTC