- From: Martin J. Duerst <duerst@w3.org>
- Date: Fri, 07 Jul 2000 15:52:48 +0900
- To: Chris Wendt <christw@MICROSOFT.com>, "'Vinod Balakrishnan'" <vinod@filemaker.com>, Lenny Turetsky<LTuretsky@salesforce.com>, "'www-international@w3c.org'"<www-international@w3c.org>, "'servlet-interest@java.sun.com'"<servlet-interest@java.sun.com>
At 00/07/06 15:10 -0700, Chris Wendt wrote: >URL encoding encodes bytes, not characters. The character encoding is a >separate, independent layer. > >Vinod is probably referring to the ECMAScript Escape() function which >encodes every non-Latin1 character like %uxxxx where xxxx is the Unicode >code point in hex characters. >http://msdn.microsoft.com/scripting/JScript/doc/jsglobalescape.htm > >I don't consider the ECMAScript method a valid, recognized URL encoding and >as far as I know, ECMAScript is the only service where this escaping method >is implemented. True. ECMAScript went into a direction that other things didn't. An update of the ECMAScript standard contains a new function that encodes all non-ASCII characters (plus some ASCII characters that are not allowed in URIs) by first using UTF-8 and then encoding the resulting bytes with %hh. Using UTF-8 is recommended for all new URI schemes, for URIs in XML, and so on. Please see http://www.w3.org/International/O-URL-and-ident.html. >IE5 and later will submit characters that don't fit the form document >charset like HTML numeric character references &#nnnnn;. The bytes with the >us-ascii representations &, # and ; are URL reserved bytes so they will be >URL escaped as %25, %23 and %3B resp. If UTF-8 is used for the page, of course, there won't be any such characters. >Characters that do fit the form document charset undergo simple URL encoding >per byte. Does IE support the 'accept-charset' parameter on FORM? Regards, Martin. >-----Original Message----- >From: Vinod Balakrishnan [mailto:vinod@filemaker.com] >Sent: Thursday, July 06, 2000 1:52 PM >To: Lenny Turetsky; 'www-international@w3c.org'; >'servlet-interest@java.sun.com' >Subject: Re: URL-encode international characters in Java? > > >You can encode Big-5 and other double byte script characters in UTF16. I >have seen IE5 is encoding the URLs with "%u" prefix for UTF16. But in >case of UTF8 we don't have any standard prefix for representing that yet. > >-Vinod > > >Hi all, > > > >Is there a standard way to URL-encode non-English characters in Java? For > >example, I know that '?' is URL-encoded as '%3F', but I don't know how or >if > >Big-5 characters can be URL-encoded. I've experimented a bit, and found >that > >IE will encode things differently based on the charset of the HTML doc >which > >contains the form. > > > >Ideally, I'd like to use functionality available in Java Servlets, or > >another Java code library, but any solutions would be much appreciated. >I've > >looked at Java's java.net.URLEncoder class, but it's encode() method won't > >do it, as documented in the JDC's bug database ( > >http://developer.java.sun.com/developer/bugParade/bugs/4257115.html > ><http://developer.java.sun.com/developer/bugParade/bugs/4257115.html> ). > > > >Is the only known solution to write my own encoder? If so, where can I find > >a list of the character's that *don't* need to be encoded? Is it just > >[A-Za-z0-9_]? > > > >Thanks, > >Lenny Turetsky > >
Received on Friday, 7 July 2000 03:27:52 UTC