W3C home > Mailing lists > Public > www-tag@w3.org > April 2003

RE: IRIs, URIs, TAG issues 15 and 27

From: Williams, Stuart <skw@hplb.hpl.hp.com>
Date: Fri, 25 Apr 2003 13:40:19 +0100
Message-ID: <5E13A1874524D411A876006008CD059F04A074CB@0-mail-1.hpl.hp.com>
To: "'Tim Bray'" <tbray@textuality.com>
Cc: uri@w3.org, WWW-Tag <www-tag@w3.org>


> -----Original Message-----
> From: Tim Bray [mailto:tbray@textuality.com]
> Sent: 15 April 2003 00:31
> To: WWW-Tag; uri@w3.org
> Subject: IRIs, URIs, TAG issues 15 and 27
> The TAG has two issues on its plate, 
> http://www.w3.org/2001/tag/ilist#URIEquivalence-15 (essentially: when 
> are two URIs considered equivalent?) and
> http://www.w3.org/2001/tag/ilist#IRIEverywhere-27 (essentially: what 
> should we do about IRIs?).
> I, with lots of TAG input, drafted some text on the comparison issue, 
> most of which has now made it into the latest draft of the RFC2396 
> revision at 
> http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html.
> We've spun our wheels on this quite a bit and failed to record much 
> progress.  However, I feel that we do have quite a bit of consensus 
> lurking and we can move forward on these issues.

> I suspect that we agree on the following:
> 1. In response to the basic question asked by Jonathan Marsh et al in 
> Issue 27, the TAG answers, first of all, "Yes".  That is, we believe 
> that it is important that Web identifiers be able to use non-ASCII 
> characters natively and straightforwardly, and that the IRI work (see 
> http://www.w3.org/International/iri-edit/) is sound and is making good 
> progress.  


> That said, the draft is not yet stable or finalized, and we 
> agree with the concern addressed by Marsh et al about the risk involved 
> when referencing unstable standards.


> As of now, both XML 1.* and XML 
> Schema's "AnyURI" work define a construct where IRIs may be used, and 
> the benefit seems to justify the risk.

"...define a construct where IRIs may be used *once they have become defined
in a normative specification*..."


In particular mostly like the way this has been handled in XML Namespaces
1.1 [1]:

"  9 Internationalized Resource Identifiers (IRIs)

   Work is currently in progress to produce an RFC defining
   Resource Identifiers (IRIs). Since this work is not yet complete, in this

   section we give a syntactic definition of IRIs for the purposes of this 
   specification. We expect to issue an erratum replacing this section with 
   a reference to the RFC when it is published. Users defining namespaces
   advised to restrict namespace names to URIs until software supporting
   is in common use. 

   For a more general definition and discussion of IRIs see [IRI draft] 
   (work in progress). 

   URI references are restricted to a subset of the ASCII characters; 
   IRI references allow some of the disallowed ASCII characters as 
   well as most Unicode characters from #xA0 onwards."

However, I would omit the two definitions that then follow, deferring those
the IRI spec... the strongest statement being the last sentence of the first
of these paragraphs advising the use of namespace names that meet the
constraints of URI syntax until IRI software is widely deployed (which I
also take to imply the existence of a normative IRI specification).

[1] http://www.w3.org/TR/2002/CR-xml-names11-20021218/#IRIs

> 2. The TAG notes that, with the blessing of the XML Namespaces 
> recommendation, some software is observed to take decisions about URI 
> equivalence on the basis of strcmp() or its equivalent.  This is 
> widespread enough that it's not going to go away.

I'm not sure that I note this with blessing. It is certainly a pragmatic
thing to do - and could be defined as the equivalence to be applied for NS
comparisons... although some pureism in me would prefer the comparison of
canonical forms - which obviously then begs the definition of a canoical

I accept that strcmp is pragmatic and in widespread use... 

> 3. The TAG urges both spec and software authors to familiarize 
> themselves with the issues around URI comparison as laid out in 
> RFC2396bis, specifically 
> http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html#r
> fc.section.6

Yes... and raise issues they may have on this narrative on uri@w3.org.

> 4. Because of the prevalence of simple string comparison of URIs, and 
> because of the Web Architectural principle that consistency in naming is 
> important, the TAG urges creators of URIs to create them in as canonical 
> a form as possible.  Section 6.3 of the RFC2396bis draft provides rules 
> for this that are applicable both to URIs and IRIs.  This will have the 
> beneficial effect that strcmp() will be an accurate (and very cheap) 
> equivalence test.

Ok, yes, but this suggest to me the increasing importance of a canonical

URI C14N!!

> =============================================================
> Following on from this, TimBL keeps raising the importance of coherent 
> round-tripping IRI->URI->IRI, but I've not quite managed to grasp the 
> core of that issue fully enough to tell whether we really have 
> consensus; Tim, any chance of outlining that one in writing 
> or have you 
> already?
> ==============================================================
> We can't close our issue 15 until the RFC2396 redrafting is finished, 
> but given the above, I think we can close #27.

Yes, I think we can. 

I think we should say that WGs trying to progress specs. toward
recommendation through the W3C process should aim to make their specs. as
IRI ready as possible; that they should warn against the use of identifiers
that don't conform to URI syntax in IRI ready positions until such time as a
nomative IRI spec exists and there is widespread software deployed
supporting that spec.; and signal the possibility of spec errata that may
post date the publication of an normative IRI spec that will address an
further issues raised by that publication.

> -- 
> Cheers, Tim Bray
>          (ongoing fragmented essay: http://www.tbray.org/ongoing/)


Received on Friday, 25 April 2003 08:40:51 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:55:58 UTC