RE: IRIs, URIs, TAG issues 15 and 27

It is not enough to say "strcmp()is OK".  We must require 
the minimal level of C14N provided by XML, namely resolution 
of entities and numeric character references.  While this is 
obvious in a pure XML environment, I remain concerned that 
namespace URIs may be plucked out of XML documents and 
transplanted into other environments without the application 
of this minimal C14N.

Misha


> -----Original Message-----
> From: Williams, Stuart [mailto:skw@hplb.hpl.hp.com] 
> Sent: 25 April 2003 13:40
> To: 'Tim Bray'
> Cc: uri@w3.org; WWW-Tag
> Subject: RE: IRIs, URIs, TAG issues 15 and 27
> 
> 
> 
> Tim,
> 
> 
> > -----Original Message-----
> > From: Tim Bray [mailto:tbray@textuality.com]
> > Sent: 15 April 2003 00:31
> > To: WWW-Tag; uri@w3.org
> > Subject: IRIs, URIs, TAG issues 15 and 27
> > 
> > 
> > 
> > The TAG has two issues on its plate, 
> > http://www.w3.org/2001/tag/ilist#URIEquivalence-15 
> (essentially: when 
> > are two URIs considered equivalent?) and
> > http://www.w3.org/2001/tag/ilist#IRIEverywhere-27 
> (essentially: what 
> > should we do about IRIs?).
> > 
> > I, with lots of TAG input, drafted some text on the 
> comparison issue, 
> > most of which has now made it into the latest draft of the RFC2396 
> > revision at 
> > http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html.
> > 
> > We've spun our wheels on this quite a bit and failed to record much 
> > progress.  However, I feel that we do have quite a bit of consensus 
> > lurking and we can move forward on these issues.
> 
> Indeed...
> > 
> > I suspect that we agree on the following:
> > 
> > 1. In response to the basic question asked by Jonathan 
> Marsh et al in 
> > Issue 27, the TAG answers, first of all, "Yes".  That is, 
> we believe 
> > that it is important that Web identifiers be able to use non-ASCII 
> > characters natively and straightforwardly, and that the IRI 
> work (see 
> > http://www.w3.org/International/iri-edit/) is sound and is 
> making good 
> > progress.  
> 
> Yes.
> 
> > That said, the draft is not yet stable or finalized, and we 
> > agree with the concern addressed by Marsh et al about the 
> risk involved 
> > when referencing unstable standards.
> 
> Yes.
> 
> > As of now, both XML 1.* and XML 
> > Schema's "AnyURI" work define a construct where IRIs may be 
> used, and 
> > the benefit seems to justify the risk.
> 
> "...define a construct where IRIs may be used *once they have 
> become defined
> in a normative specification*..."
> 
> Yes.
> 
> In particular mostly like the way this has been handled in 
> XML Namespaces
> 1.1 [1]:
> 
> "  9 Internationalized Resource Identifiers (IRIs)
> 
>    Work is currently in progress to produce an RFC defining
> Internationalized 
>    Resource Identifiers (IRIs). Since this work is not yet 
> complete, in this
> 
>    section we give a syntactic definition of IRIs for the 
> purposes of this 
>    specification. We expect to issue an erratum replacing 
> this section with 
>    a reference to the RFC when it is published. Users 
> defining namespaces
> are 
>    advised to restrict namespace names to URIs until software 
> supporting
> IRIs 
>    is in common use. 
> 
>    For a more general definition and discussion of IRIs see 
> [IRI draft] 
>    (work in progress). 
> 
>    URI references are restricted to a subset of the ASCII characters; 
>    IRI references allow some of the disallowed ASCII characters as 
>    well as most Unicode characters from #xA0 onwards."
> 
> However, I would omit the two definitions that then follow, 
> deferring those
> the IRI spec... the strongest statement being the last 
> sentence of the first
> of these paragraphs advising the use of namespace names that meet the
> constraints of URI syntax until IRI software is widely 
> deployed (which I
> also take to imply the existence of a normative IRI specification).
> 
> [1] http://www.w3.org/TR/2002/CR-xml-names11-20021218/#IRIs
> 
> > 2. The TAG notes that, with the blessing of the XML Namespaces 
> > recommendation, some software is observed to take decisions 
> about URI 
> > equivalence on the basis of strcmp() or its equivalent.  This is 
> > widespread enough that it's not going to go away.
> 
> I'm not sure that I note this with blessing. It is certainly 
> a pragmatic
> thing to do - and could be defined as the equivalence to be 
> applied for NS
> comparisons... although some pureism in me would prefer the 
> comparison of
> canonical forms - which obviously then begs the definition of 
> a canoical
> form.
> 
> I accept that strcmp is pragmatic and in widespread use... 
> 
> > 3. The TAG urges both spec and software authors to familiarize 
> > themselves with the issues around URI comparison as laid out in 
> > RFC2396bis, specifically 
> > http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html#r
> > fc.section.6
> 
> Yes... and raise issues they may have on this narrative on uri@w3.org.
> 
> > 4. Because of the prevalence of simple string comparison of 
> URIs, and 
> > because of the Web Architectural principle that consistency 
> in naming is 
> > important, the TAG urges creators of URIs to create them in 
> as canonical 
> > a form as possible.  Section 6.3 of the RFC2396bis draft 
> provides rules 
> > for this that are applicable both to URIs and IRIs.  This 
> will have the 
> > beneficial effect that strcmp() will be an accurate (and 
> very cheap) 
> > equivalence test.
> 
> Ok, yes, but this suggest to me the increasing importance of 
> a canonical
> form.
> 
> URI C14N!!
> 
> > =============================================================
> > 
> > Following on from this, TimBL keeps raising the importance 
> of coherent 
> > round-tripping IRI->URI->IRI, but I've not quite managed to 
> grasp the 
> > core of that issue fully enough to tell whether we really have 
> > consensus; Tim, any chance of outlining that one in writing 
> > or have you 
> > already?
> > 
> > ==============================================================
> > 
> > We can't close our issue 15 until the RFC2396 redrafting is 
> finished, 
> > but given the above, I think we can close #27.
> 
> Yes, I think we can. 
> 
> I think we should say that WGs trying to progress specs. toward
> recommendation through the W3C process should aim to make 
> their specs. as
> IRI ready as possible; that they should warn against the use 
> of identifiers
> that don't conform to URI syntax in IRI ready positions until 
> such time as a
> nomative IRI spec exists and there is widespread software deployed
> supporting that spec.; and signal the possibility of spec 
> errata that may
> post date the publication of an normative IRI spec that will 
> address an
> further issues raised by that publication.
> 
> > -- 
> > Cheers, Tim Bray
> >          (ongoing fragmented essay: http://www.tbray.org/ongoing/)
> 
> regards
> 
> Stuart
> 
> 
> 


--------------------------------------------------------------- -
        Visit our Internet site at http://www.reuters.com

Get closer to the financial markets with Reuters Messaging - for more
information and to register, visit http://www.reuters.com/messaging

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.

Received on Monday, 28 April 2003 06:18:19 UTC