From: "Larry Masinter" <email@example.com> To: "Martin J. Duerst" <firstname.lastname@example.org>, <email@example.com>, <uri@Bunyip.Com> Date: Thu, 27 May 1999 14:28:06 PDT Message-ID: <firstname.lastname@example.org> In-Reply-To: <199905260654.PAA08644@sh.w3.mag.keio.ac.jp> Subject: RE: Special characters in URIs URL character escaping normally should only be done at the time the URL is constructed from its component pieces, and normally should only be undone (unescaped) when the URL is decomposed into its internal pieces. Your description of the process of either applying or removing %XX escaping seems to be based on having the escapes applied or removed when the URL is removed from or embedded in some context such as XML. In general, you cannot change an arbitrary %XX into the character the XX byte sequence represents in ASCII without some risk of changing the meaning of the URL, and so you should not recommend this process at all. Larry -- http://www.parc.xerox.com/masinter > The second case basically works by saying that if in these formats > (e.g. HTML), an URI contains a non-ASCII character, this character > is converted to a byte sequence using UTF-8 and then %-encoded to > produce a legal URI. I think "works" is ambitious. It "works" because most HTTP servers are forgiving about this kind of transliteration and most URLs are HTTP.