- From: Paul Deuter <PaulD@plumtree.com>
- Date: Mon, 03 Nov 2003 07:06:17 -0500
- To: www-international@w3.org
Steve is correct. There does not seem to be any standard for encoding the "filename" value in the Content-Disposition header. Martin: is this something the W3C could take up? As far as I can tell, the standards concerning the Content-Disposition header omit any mention of how to handle characters outside the ASCII range. As a result, servers and user agents and apps have all done their own thing. We also found the 17 character limit in IE for Japanese characters. This may sound like a strange limit, but it makes more sense if one realizes that the conversion of Japanese characters to UTF-8 and then %HH encoding causes a 9x expansion. Still, it seemed to us that this was a bug in IE and Microsoft agreed. Microsoft issued a patch (Q816868) which has subsequently been superseded by other patches (Q818506). We also found that different Japanese versions of IE work differently than the US English version. And also there are differences in IE 5, 5.5, and 6.x. It seems that some versions of Japanese IE want the filename to be %HH encoded in Shift-Jis. In our experience, Netscape 7.x works best. That does not help much, since not too many people are using Netscape. Furthermore, we found that IE works differently depending on whether the Content-Disposition header uses "inline" versus "attachment". Finally, we also ran into difficulty with Excel which has a limit for filenames of 218 characters. If you try to open a file (without first saving it) over the web, then the filename is not unencoded (that is, it is left in its UTF-8/%HH form) and put into a temporary folder. If the filename is Japanese, then a relatively short filename can easily run longer than 218 characters depending on the folder structure on your system. The result is that Excel will not be able to "locate" the file and will not open it for you. We have worked around this problem by simply truncating filenames which seem too long. Our thinking it was better to be able to open and view the file (even with a truncated name), than to have the full name but be unable to view the file. In short, our server code is full of special cases for different browsers and apps. And we continually encounter more cases that don't work. -Paul -----Urspr$B—O(Bgliche Nachricht----- Von: Steve Billings [mailto:billings@global360.com] Gesendet: Thursday, October 30, 2003 9:23 AM An: souravm (by way of Martin Duerst <duerst@w3.org>); www-international@w3.org Betreff: RE: Problem in downloading a pdf file having Japanese characters in the name of the file I wrestled with this problem earlier this year, and unfortunately found no good solutions. As far as I can tell (and I hope someone can prove me wrong), it's a yet-to-be solved problem in the internet infrastructure. I was using recent versions of IE and Netscape browsers, and a not-so-new version of Tomcat (3.something, I think). The approach that came closest to working was to encode the filename using URLEncoder (http://java.sun.com/j2se/1.4.1/docs/api/java/net/URLEncoder.html) with UTF-8, and set the Content-Disposition according to RFC 2047 as follows: String encoded_filename = URLEncoder.encode(filename, "UTF-8"); String contentDisp = "=?UTF-8?Q?attachment; filename=" + encoded_filename + ";?="; res.setHeader("Content-Disposition", contentDisp); With this approach, if the Japanese filename is short, when you save the file from the browser, everything looks fine. If you open it without saving it, Notepad gets the encoded name (bad). Another problem is that this approach can only handle filenames up to about 17 Japanese characters. I tried using other standards (RFC 2184, RFC 2231) with no success. It wasn't available to me in Tomcat, but this looked like it might have some promise: http://java.sun.com/j2ee/sdk_1.3/techdocs/api/javax/mail/internet/MimeUtilit y.html I hope you find a solution. If you do, please share it! Steve Steve Billings Global 360 Software Internationalization & Localization http://www.global360.com/ Office: 978-266-1604 Cell: 978-697-8201 -----Original Message----- From: www-international-request@w3.org [mailto:www-international-request@w3.org]On Behalf Of souravm (by way of Martin Duerst <duerst@w3.org>) Sent: Tuesday, October 28, 2003 9:30 PM To: www-international@w3.org Subject: Problem in downloading a pdf file having Japanese characters in the name of the file Hi All, I've a pdf file available in a solaris file server. The name of the file contains Japanese characters. I'm trying to download this file using a Servlet. For that purpose I'm setting - res.setContentType("application/pdf"); res.setHeader("Content-disposition", "inline; filename=" + fileName); This filename is a Unicode string contaiing some Japanese characters. The download is not happening in this case. However, if the filename contains English Characters it works fine. Could anyone please let me know what is the problem and the solution for it ? Thanks in advance. Regards, Sourav
Received on Monday, 3 November 2003 07:16:49 UTC