IRIs, URIs, TAG issues 15 and 27

The TAG has two issues on its plate, (essentially: when 
are two URIs considered equivalent?) and (essentially: what 
should we do about IRIs?).

I, with lots of TAG input, drafted some text on the comparison issue, 
most of which has now made it into the latest draft of the RFC2396 
revision at

We've spun our wheels on this quite a bit and failed to record much 
progress.  However, I feel that we do have quite a bit of consensus 
lurking and we can move forward on these issues.

I suspect that we agree on the following:

1. In response to the basic question asked by Jonathan Marsh et al in 
Issue 27, the TAG answers, first of all, "Yes".  That is, we believe 
that it is important that Web identifiers be able to use non-ASCII 
characters natively and straightforwardly, and that the IRI work (see is sound and is making good 
progress.  That said, the draft is not yet stable or finalized, and we 
agree with the concern addressed by Marsh et al about the risk involved 
when referencing unstable standards.  As of now, both XML 1.* and XML 
Schema's "AnyURI" work define a construct where IRIs may be used, and 
the benefit seems to justify the risk.

2. The TAG notes that, with the blessing of the XML Namespaces 
recommendation, some software is observed to take decisions about URI 
equivalence on the basis of strcmp() or its equivalent.  This is 
widespread enough that it's not going to go away.

3. The TAG urges both spec and software authors to familiarize 
themselves with the issues around URI comparison as laid out in 
RFC2396bis, specifically

4. Because of the prevalence of simple string comparison of URIs, and 
because of the Web Architectural principle that consistency in naming is 
important, the TAG urges creators of URIs to create them in as canonical 
a form as possible.  Section 6.3 of the RFC2396bis draft provides rules 
for this that are applicable both to URIs and IRIs.  This will have the 
beneficial effect that strcmp() will be an accurate (and very cheap) 
equivalence test.


Following on from this, TimBL keeps raising the importance of coherent 
round-tripping IRI->URI->IRI, but I've not quite managed to grasp the 
core of that issue fully enough to tell whether we really have 
consensus; Tim, any chance of outlining that one in writing or have you 


We can't close our issue 15 until the RFC2396 redrafting is finished, 
but given the above, I think we can close #27.

Cheers, Tim Bray
         (ongoing fragmented essay:

Received on Monday, 14 April 2003 19:30:49 UTC