- From: McDonald, Ira <imcdonald@sharplabs.com>
- Date: Sun, 27 Mar 2005 09:05:44 -0800
- To: "'James Cerra'" <jfcst24_public@yahoo.com>, Mike Brown <mike@skew.org>
- Cc: uri@w3.org
Hi James, Please don't percent encode UTF-16 in a URI. RFC 3986 "URI Generic Syntax" (which obsoletes RFC 2396) says on page 21: "The reg-name syntax allows percent-encoded octets in order to represent non-ASCII registered names in a uniform way that is independent of the underlying name resolution technology. Non-ASCII characters must first be encoded according to UTF-8 [STD63], and then each octet of the corresponding UTF-8 sequence must be percent- > encoded to be represented as URI characters. URI producing > applications must not use percent-encoding in host unless it is used > to represent a UTF-8 character sequence. When a non-ASCII registered name represents an internationalized domain name intended for resolution via the DNS, the name must be transformed to the IDNA encoding [RFC3490] prior to name lookup. URI producers should provide these registered names in the IDNA encoding, rather than a percent-encoding, if they wish to maximize interoperability with legacy URI resolvers." Now that applies only to the domain name part of a URI, but the point is that it's impossible to mix two different 'native' encodings in a single URI - because the receiver couldn't possibly know how to decode them. Cheers, - Ira Ira McDonald (Musician / Software Architect) Blue Roof Music / High North Inc PO Box 221 Grand Marais, MI 49839 phone: +1-906-494-2434 email: imcdonald@sharplabs.com -----Original Message----- From: uri-request@w3.org [mailto:uri-request@w3.org]On Behalf Of James Cerra Sent: Sunday, March 27, 2005 4:46 AM To: Mike Brown Cc: uri@w3.org Subject: Re: Encoding URI From/To UTF-16 Questions Mike, Thanks for the explanation. It was incredibly enlightening, but I'm still a little confused. <...snip...> I think I undestand. So say the program got the U+4F5B HAN IDEOGRAPH character, and the user wants to use UTF-16 as the character encoding for the bytes. Then the program should: <...snip...> I appreciate your help. Thanks! -- Jimmy Cerra P.S. I rewrote this responce several times as I came to understand you post. Please excuse (or point out) and incongruities. [1] http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rmap/D2Rmap.htm [2] RFC 3986, Sections 2.3 and 2.4. [3] http://skew.org/xml/stylesheets/url-encode/ [4] http://www.w3.org/International/O-URL-code.html [5] java.net.URLEncoder and java.net.URLDecoder __________________________________ Do you Yahoo!? Yahoo! Small Business - Try our new resources site! http://smallbusiness.yahoo.com/resources/
Received on Sunday, 27 March 2005 17:06:36 UTC