Re: Special characters in URIs

Larry Masinter (masinter@parc.xerox.com)
Thu, 27 May 1999 14:28:06 PDT


From: "Larry Masinter" <masinter@parc.xerox.com>
To: "Martin J. Duerst" <duerst@w3.org>, <ietf-url@imc.org>, <uri@Bunyip.Com>
Date: Thu, 27 May 1999 14:28:06 PDT
Message-ID: <000201bea887$cf805ce0$aa66010d@copper.parc.xerox.com>
In-Reply-To: <199905260654.PAA08644@sh.w3.mag.keio.ac.jp>
Subject: RE: Special characters in URIs

URL character escaping normally should only be done at the
time the URL is constructed from its component pieces, and
normally should only be undone (unescaped) when the URL
is decomposed into its internal pieces.  Your description
of the process of either applying or removing %XX escaping
seems to be based on having the escapes applied or removed
when the URL is removed from or embedded in some context
such as XML. In general, you cannot change an arbitrary
%XX into the character the XX byte sequence represents in
ASCII without some risk of changing the meaning of the URL,
and so you should not recommend this process at all.

Larry
-- 
http://www.parc.xerox.com/masinter


> The second case basically works by saying that if in these formats
> (e.g. HTML), an URI contains a non-ASCII character, this character
> is converted to a byte sequence using UTF-8 and then %-encoded to
> produce a legal URI.

I think "works" is ambitious. It "works" because most
HTTP servers are forgiving about this kind of transliteration
and most URLs are HTTP.