W3C home > Mailing lists > Public > www-international@w3.org > October to December 2003

RE: Problem in downloading a pdf file having Japanese characters in the name of the file

From: Addison Phillips [wM] <aphillips@webmethods.com>
Date: Mon, 3 Nov 2003 08:10:18 -0800
To: "Paul Deuter (by way of Martin Duerst <duerst@w3.org>)" <PaulD@plumtree.com>, <www-international@w3.org>
Message-ID: <PNEHIBAMBMLHDMJDDFLHAEKMHCAA.aphillips@webmethods.com>

I think that an Internet-Draft with the IETF (that is, a new RFC) would be a
more likely alternative, since RFCs define the headers and their semantics.

It also seems that there are standards, but that they are not implemented
consistently. That might actually be the starting place.

Best Regards,

Addison

Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
http://www.webMethods.com
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force
http://www.w3.org/International

Internationalization is an architecture.
It is not a feature.

> -----Original Message-----
> From: www-international-request@w3.org
> [mailto:www-international-request@w3.org]On Behalf Of Paul Deuter
> (by way of Martin Duerst <duerst@w3.org>)
> Sent: lundi 3 novembre 2003 04:06
> To: www-international@w3.org
> Subject: AW: Problem in downloading a pdf file having Japanese
> characters in the name of the file
>
>
>
>
>
>
> Steve is correct.  There does not seem to be any standard for encoding
> the "filename" value in the Content-Disposition header.
>
> Martin: is this something the W3C could take up?
>
> As far as I can tell, the standards concerning the
> Content-Disposition header
> omit any mention of how to handle characters outside the ASCII
> range.  As a
> result, servers and user agents and apps have all done their own thing.
>
> We also found the 17 character limit in IE for Japanese characters.  This
> may sound
> like a strange limit, but it makes more sense if one realizes that the
> conversion
> of Japanese characters to UTF-8 and then %HH encoding causes a 9x
> expansion.
> Still, it seemed to us that this was a bug in IE and Microsoft
> agreed.  Microsoft
> issued a patch (Q816868) which has subsequently been superseded by other
> patches
> (Q818506).
>
> We also found that different Japanese versions of IE work
> differently than
> the US English
> version.  And also there are differences in IE 5, 5.5, and 6.x.  It seems
> that some
> versions of Japanese IE want the filename to be %HH encoded in
> Shift-Jis.  In our experience,
> Netscape 7.x works best.  That does not help much, since not too many
> people are using
> Netscape.
>
> Furthermore, we found that IE works differently depending on whether the
> Content-Disposition
> header uses "inline" versus "attachment".
>
> Finally, we also ran into difficulty with Excel which has a limit for
> filenames of 218
> characters.  If you try to open a file (without first saving it) over the
> web, then
> the filename is not unencoded (that is, it is left in its UTF-8/%HH form)
> and put into a
> temporary folder.  If the filename is Japanese, then a relatively short
> filename can
> easily run longer than 218 characters depending on the folder
> structure on
> your system.
> The result is that Excel will not be able to "locate" the file
> and will not
> open it for
> you.  We have worked around this problem by simply truncating filenames
> which seem too
> long.  Our thinking it was better to be able to open and view the file
> (even with a
> truncated name), than to have the full name but be unable to view
> the file.
>
> In short, our server code is full of special cases for different browsers
> and apps.  And we
> continually encounter more cases that don't work.
>
> -Paul
>
>
>
> -----Urspr?gliche Nachricht-----
> Von: Steve Billings [mailto:billings@global360.com]
> Gesendet: Thursday, October 30, 2003 9:23 AM
> An: souravm (by way of Martin Duerst <duerst@w3.org>);
> www-international@w3.org
> Betreff: RE: Problem in downloading a pdf file having Japanese characters
> in the name of the file
>
>
> I wrestled with this problem earlier this year, and unfortunately
> found no
> good solutions. As far as I can tell (and I hope someone can prove me
> wrong), it's a yet-to-be solved problem in the internet infrastructure. I
> was using recent versions of IE and Netscape browsers, and a not-so-new
> version of Tomcat (3.something, I think).
>
> The approach that came closest to working was to encode the
> filename using
> URLEncoder
> (http://java.sun.com/j2se/1.4.1/docs/api/java/net/URLEncoder.html) with
> UTF-8, and set the Content-Disposition according to RFC 2047 as follows:
> String encoded_filename = URLEncoder.encode(filename, "UTF-8"); String
> contentDisp = "=?UTF-8?Q?attachment; filename=" +
> encoded_filename + ";?=";
> res.setHeader("Content-Disposition", contentDisp); With this approach, if
> the Japanese filename is short, when you save the file from the browser,
> everything looks fine. If you open it without saving it, Notepad gets the
> encoded name (bad). Another problem is that this approach can only handle
> filenames up to about 17 Japanese characters.
>
> I tried using other standards (RFC 2184, RFC 2231) with no success.
>
> It wasn't available to me in Tomcat, but this looked like it
> might have some
> promise:
> http://java.sun.com/j2ee/sdk_1.3/techdocs/api/javax/mail/internet/
MimeUtilit
y.html

I hope you find a solution. If you do, please share it!
Steve

Steve Billings
Global 360
Software Internationalization & Localization http://www.global360.com/
Office: 978-266-1604
Cell:    978-697-8201

-----Original Message-----
From: www-international-request@w3.org
[mailto:www-international-request@w3.org]On Behalf Of souravm (by way of
Martin Duerst <duerst@w3.org>)
Sent: Tuesday, October 28, 2003 9:30 PM
To: www-international@w3.org
Subject: Problem in downloading a pdf file having Japanese characters in
the name of the file




Hi All,

I've a pdf file available in a solaris file server. The name of the file
contains Japanese characters.

I'm trying to download this file using a Servlet. For that purpose I'm
setting -

res.setContentType("application/pdf");
res.setHeader("Content-disposition", "inline; filename=" + fileName);

This filename is a Unicode string contaiing some Japanese characters.

The download is not happening in this case. However, if the filename
contains English Characters it works fine.

Could anyone please let me know what is the problem and the solution for it
?

Thanks in advance.

Regards,
Sourav
Received on Monday, 3 November 2003 11:12:24 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:03 GMT