AW: Problem in downloading a pdf file having Japanese characters in the name of the file

Steve is correct.  There does not seem to be any standard for encoding
the "filename" value in the Content-Disposition header.

Martin: is this something the W3C could take up?

As far as I can tell, the standards concerning the Content-Disposition header
omit any mention of how to handle characters outside the ASCII range.  As a
result, servers and user agents and apps have all done their own thing.

We also found the 17 character limit in IE for Japanese characters.  This 
may sound
like a strange limit, but it makes more sense if one realizes that the 
conversion
of Japanese characters to UTF-8 and then %HH encoding causes a 9x expansion.
Still, it seemed to us that this was a bug in IE and Microsoft 
agreed.  Microsoft
issued a patch (Q816868) which has subsequently been superseded by other 
patches
(Q818506).

We also found that different Japanese versions of IE work differently than 
the US English
version.  And also there are differences in IE 5, 5.5, and 6.x.  It seems 
that some
versions of Japanese IE want the filename to be %HH encoded in 
Shift-Jis.  In our experience,
Netscape 7.x works best.  That does not help much, since not too many 
people are using
Netscape.

Furthermore, we found that IE works differently depending on whether the 
Content-Disposition
header uses "inline" versus "attachment".

Finally, we also ran into difficulty with Excel which has a limit for 
filenames of 218
characters.  If you try to open a file (without first saving it) over the 
web, then
the filename is not unencoded (that is, it is left in its UTF-8/%HH form) 
and put into a
temporary folder.  If the filename is Japanese, then a relatively short 
filename can
easily run longer than 218 characters depending on the folder structure on 
your system.
The result is that Excel will not be able to "locate" the file and will not 
open it for
you.  We have worked around this problem by simply truncating filenames 
which seem too
long.  Our thinking it was better to be able to open and view the file 
(even with a
truncated name), than to have the full name but be unable to view the file.

In short, our server code is full of special cases for different browsers 
and apps.  And we
continually encounter more cases that don't work.

-Paul



-----Urspr$B—O(Bgliche Nachricht-----
Von: Steve Billings [mailto:billings@global360.com]
Gesendet: Thursday, October 30, 2003 9:23 AM
An: souravm (by way of Martin Duerst <duerst@w3.org>); www-international@w3.org
Betreff: RE: Problem in downloading a pdf file having Japanese characters 
in the name of the file


I wrestled with this problem earlier this year, and unfortunately found no 
good solutions. As far as I can tell (and I hope someone can prove me 
wrong), it's a yet-to-be solved problem in the internet infrastructure. I 
was using recent versions of IE and Netscape browsers, and a not-so-new 
version of Tomcat (3.something, I think).

The approach that came closest to working was to encode the filename using 
URLEncoder
(http://java.sun.com/j2se/1.4.1/docs/api/java/net/URLEncoder.html) with 
UTF-8, and set the Content-Disposition according to RFC 2047 as follows:
String encoded_filename = URLEncoder.encode(filename, "UTF-8"); String 
contentDisp = "=?UTF-8?Q?attachment; filename=" + encoded_filename + ";?="; 
res.setHeader("Content-Disposition", contentDisp); With this approach, if 
the Japanese filename is short, when you save the file from the browser, 
everything looks fine. If you open it without saving it, Notepad gets the 
encoded name (bad). Another problem is that this approach can only handle 
filenames up to about 17 Japanese characters.

I tried using other standards (RFC 2184, RFC 2231) with no success.

It wasn't available to me in Tomcat, but this looked like it might have some
promise:
http://java.sun.com/j2ee/sdk_1.3/techdocs/api/javax/mail/internet/MimeUtilit
y.html

I hope you find a solution. If you do, please share it!
Steve

Steve Billings
Global 360
Software Internationalization & Localization http://www.global360.com/
Office: 978-266-1604
Cell:    978-697-8201

-----Original Message-----
From: www-international-request@w3.org
[mailto:www-international-request@w3.org]On Behalf Of souravm (by way of 
Martin Duerst <duerst@w3.org>)
Sent: Tuesday, October 28, 2003 9:30 PM
To: www-international@w3.org
Subject: Problem in downloading a pdf file having Japanese characters in 
the name of the file




Hi All,

I've a pdf file available in a solaris file server. The name of the file 
contains Japanese characters.

I'm trying to download this file using a Servlet. For that purpose I'm 
setting -

res.setContentType("application/pdf");
res.setHeader("Content-disposition", "inline; filename=" + fileName);

This filename is a Unicode string contaiing some Japanese characters.

The download is not happening in this case. However, if the filename 
contains English Characters it works fine.

Could anyone please let me know what is the problem and the solution for it ?

Thanks in advance.

Regards,
Sourav

Received on Monday, 3 November 2003 07:16:49 UTC