- From: xuefer tinys <xuefer@hotmail.com>
- Date: Mon, 23 Dec 2002 10:43:02 +0800
- To: ajmas@sympatico.ca, www-talk@w3.org
no, that's the very UTF-8. i guess when u're reference to unicode, u meant UTF-16 in UTF-8, all ascii still have 1byte. u can still urldecode them into UTF-8 by the old function. and then convert UTF-8 to UTF-16, which u need >From: Andr?John Mas <ajmas@sympatico.ca> >To: www-talk@w3.org >Subject: URLs and double byte characters (unicode) >Date: Sun, 22 Dec 2002 10:12:05 -0500 > > >Hi, > >I have tried searching for documentation on URLs and double-byte >characters, even searched this mailing-list, but could find >nothing concrete. > >For me the issue has arrisen because I am writing a servlet that >allows for the browsing of a virtual directory structure that in >certain cases have entries that have chinese names. > >I have looked for some algorithms, but while they worked in the >majority of cases failed in a few special cases: > > - %20%3A%22 > -- is this a space followed by one double byte character, or > two single byte characters? > > - %3A%20%22 > -- single byte character, space, single byte character OR > double byte character, single byte character OR single > byte character, double byte character? > >Using Mozilla I find that it encodes it utf-8 urls with a mixture >of single byte and double characters. For example, a space will >be represented as %20, any reserved ASCII character will use a >single byte %xx value, but anything in chinese will be defined >using a double byte %xx%yy value. This makes is very difficult >to parse a URL. I would say that the problem is with Mozilla, >but for me the real problem is the lack of any documentation >on the issue. An RFC would be nice, so at least I know I am >dealing with the same solution with all modern web browsers. > >regards > >Andre _________________________________________________________________ Ãâ·ÑÏÂÔØ MSN Explorer: http://explorer.msn.com/lccn/
Received on Sunday, 22 December 2002 21:49:42 UTC