- From: André-John Mas <ajmas@sympatico.ca>
- Date: Sun, 22 Dec 2002 10:12:05 -0500
- To: www-talk@w3.org
Hi, I have tried searching for documentation on URLs and double-byte characters, even searched this mailing-list, but could find nothing concrete. For me the issue has arrisen because I am writing a servlet that allows for the browsing of a virtual directory structure that in certain cases have entries that have chinese names. I have looked for some algorithms, but while they worked in the majority of cases failed in a few special cases: - %20%3A%22 -- is this a space followed by one double byte character, or two single byte characters? - %3A%20%22 -- single byte character, space, single byte character OR double byte character, single byte character OR single byte character, double byte character? Using Mozilla I find that it encodes it utf-8 urls with a mixture of single byte and double characters. For example, a space will be represented as %20, any reserved ASCII character will use a single byte %xx value, but anything in chinese will be defined using a double byte %xx%yy value. This makes is very difficult to parse a URL. I would say that the problem is with Mozilla, but for me the real problem is the lack of any documentation on the issue. An RFC would be nice, so at least I know I am dealing with the same solution with all modern web browsers. regards Andre
Received on Sunday, 22 December 2002 11:27:16 UTC