- From: Chris Lilley <chris@w3.org>
- Date: Mon, 3 Feb 2003 22:51:10 +0100
- To: www-international@w3.org, Martin Duerst <duerst@w3.org>
- CC: "Ian B. Jacobs" <ij@w3.org>, www-tag@w3.org, Max Froumentin <mf@w3.org>, Michel Suignard <michelsu@microsoft.com>
On Monday, February 3, 2003, 8:54:32 PM, Martin wrote: MD> At 20:20 03/01/27 -0500, Ian B. Jacobs wrote: >>Minutes of the 27 Jan 2003 TAG teleconf available as >>HTML [1] and as text below. >> 2.3 IRIEverywhere-27 >> [25] http://www.w3.org/2001/tag/ilist#IRIEverywhere-27 >> [Zakim] >> DanCon, you wanted to suggest the value of having %7E specified to be >> equivalent to %7e is purely aesthetic, and not *nearly* worth >> the cost. OK so lets look at knock on effects here. Dan was not, I claim, looking at these effects when he made his comment; in which case his position seems very reasonable. After all these escapes are used very infrequently. But it is very damaging. It would scupper IRIs. Suppose there is some Unicode character FOO and it maps to %ab%cd%ef in UTF-8 (it won't map from those precise values, there is no such character, this is just an example). It would be highly desirable for FOO used in an IRI and the hexified version of FOO used in a URI to compare the same when comparing two URIs. If this is not done, then IRI-URI is a one-way street. For this to work in any sensible manner, then clearly it is not enough for FOO to compare the same as %ab%cd%ef. It also has to compare the same as %AB%CD%EF and %Ab%cd%eF and .... There are two ways to do this, one is to forbid one of the cases of hexified a..f and the other is to define them as the same in a hex escape. The third way, the way where %ab is not equal to %AB, means that we can just give up on making FOO compare equal to %ab%cd%ef and thus, we can just give up on any roundtripping from IRI to URI and thus, IRI becomes merely a theoretical possibility. It becomes something that exists in a spec but actual XML files contain a bunch of illegible hexified nonsense. Thus, to get to this desirable goal, then for URIs %ab and %AB and %Ab and %aB have to compare the same. This isn't "merely aesthetic' is is what IRI needs to build on. MD> Currently, Namespaces in XML 1.1 (Candidate Rec) specifies that for MD> purposes of namespace equivalence, '%7e', '%7E', and '~' are different MD> (see http://www.w3.org/TR/xml-names11/#IRIComparison). Yes. This should change. -- Chris mailto:chris@w3.org
Received on Monday, 3 February 2003 16:51:25 UTC