Re: URLs and double byte characters (unicode)

no, that's the very UTF-8.
i guess when u're reference to unicode, u meant UTF-16
in UTF-8, all ascii still have 1byte.
u can still urldecode them into UTF-8 by the old function. and then convert 
UTF-8 to UTF-16, which u need




>From: Andr?John Mas <ajmas@sympatico.ca>
>To: www-talk@w3.org
>Subject: URLs and double byte characters (unicode)
>Date: Sun, 22 Dec 2002 10:12:05 -0500
>
>
>Hi,
>
>I have tried searching for documentation on URLs and double-byte
>characters, even searched this mailing-list, but could find
>nothing concrete.
>
>For me the issue has arrisen because I am writing a servlet that
>allows for the browsing of a virtual directory structure that in
>certain cases have entries that have chinese names.
>
>I have looked for some algorithms, but while they worked in the
>majority of cases failed in a few special cases:
>
>   - %20%3A%22
>     -- is this a space followed by one double byte character, or
>     two single byte characters?
>
>   - %3A%20%22
>     -- single byte character, space, single byte character OR
>     double byte character, single byte character OR single
>     byte character, double byte character?
>
>Using Mozilla I find that it encodes it utf-8 urls with a mixture
>of single byte and double characters. For example, a space will
>be represented as %20, any reserved ASCII character will use a
>single byte %xx value, but anything in chinese will be defined
>using a double byte %xx%yy value. This makes is very difficult
>to parse a URL. I would say that the problem is with Mozilla,
>but for me the real problem is the lack of any documentation
>on the issue. An RFC would be nice, so at least I know I am
>dealing with the same solution with all modern web browsers.
>
>regards
>
>Andre


_________________________________________________________________
Ãâ·ÑÏÂÔØ MSN Explorer:  http://explorer.msn.com/lccn/

Received on Sunday, 22 December 2002 21:49:42 UTC