W3C home > Mailing lists > Public > www-international@w3.org > July to September 2000

Re: URL-encode international characters in Java?

From: Yung-Fong Tang <ftang@netscape.com>
Date: Thu, 06 Jul 2000 17:04:37 -0700
Message-ID: <39651E94.B14DCA3B@netscape.com>
To: Vinod Balakrishnan <vinod@filemaker.com>
CC: Lenny Turetsky <LTuretsky@salesforce.com>, "'www-international@w3c.org'" <www-international@w3c.org>, "'servlet-interest@java.sun.com'" <servlet-interest@java.sun.com>


Vinod Balakrishnan wrote:

> You can encode Big-5 and other double byte script characters in UTF16.

or you can url encode big5 as it's code point instead of UTF-16.

> I
> have seen IE5 is encoding the URLs with "%u" prefix for UTF16.

It is in the spec, but I didn't see any practice usage of it.

> But in
> case of UTF8 we don't have any standard prefix for representing that yet.
>
> -Vinod
>
> >Hi all,
> >
> >Is there a standard way to URL-encode non-English characters in Java? For
> >example, I know that '?' is URL-encoded as '%3F', but I don't know how or if
> >Big-5 characters can be URL-encoded. I've experimented a bit, and found that
> >IE will encode things differently based on the charset of the HTML doc which
> >contains the form.
> >
> >Ideally, I'd like to use functionality available in Java Servlets, or
> >another Java code library, but any solutions would be much appreciated. I've
> >looked at Java's java.net.URLEncoder class, but it's encode() method won't
> >do it, as documented in the JDC's bug database (
> >http://developer.java.sun.com/developer/bugParade/bugs/4257115.html
> ><http://developer.java.sun.com/developer/bugParade/bugs/4257115.html> ).
> >
> >Is the only known solution to write my own encoder? If so, where can I find
> >a list of the character's that *don't* need to be encoded? Is it just
> >[A-Za-z0-9_]?
> >
> >Thanks,
> >Lenny Turetsky
> >

Received on Thursday, 6 July 2000 20:06:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:55 GMT