ACTION-79 discussion on URI vs. IRI in the specs

I had an action item to "update spec to talk about IRIs when we really 
mean IRIs".  I have completed my review of RDFa Core and RDFa Syntax to 
ensure that we don't introduce any backward incompatibilities.... and 
now I am thoroughly confused.  Follow me here:

   1. RDFa Syntax clearly says that an expanded CURIE is a syntactically
      valid IRI.
   2. RDFa Syntax also includes by reference the XHTML Modularization
      datatype URI for use in various attributes.
   3. XHTML M12N defines the datatype URI as "A Uniform Resource
      Identifier Reference, as defined by the type |anyURI| in XMLSCHEMA
      <http://www.w3.org/TR/xhtml-modularization/references.html#ref_xmlschema>."
   4. The XML Schema anyURI type in the current Recommendation is a URI
      as defined in RFC 2396 as amended by RFC 2732.  This definition
      DOES NOT include IRIs.
   5. However, the lastest XML Schema Working Draft
      (http://www.w3.org/TR/2009/WD-xmlschema11-2-20091203/#anyURI)
      defines anyURI to be an IRI.  This was the *intent* of the XHTML
      Working Group at the time of the publication of the final XHTML
      Modularization (Stephen, please correct me if I am wrong). 
      However, the XML Schema spec is taking a while to get out the door.
   6. Consequently, I posit that the *intent* of XHTML Modularization,
      and therefore of RDFa Syntax, was that whenever we said URI we
      really mean IRI.

Independently, we recently had a discussion about whether the lexical 
space of a CURIE should be an IRI or not.  The group agreed that it 
should.  I was assigned this action item.  Unfortunately, the specs are 
riddled with uses of the term URI.  And I believe that in EVERY SINGLE 
CASE we mean IRI (as in RFC3987).  I think that it would be confusing 
for our readers to use the term IRI everywhere.  People just don't know 
what that is, and it would steepen our learning curve.  Therefore, I 
propose the following:

   1. In the 1 location where we reference RFC3987, we use the term IRI:
      "When expanded, the resulting URI MUST be a syntactically valid
      IRI [RFC3987]. For a more detailed explanation see CURIE and URI
      Processing
      <http://www.w3.org/2010/02/rdfa/sources/rdfa-core/Overview-src.html#s_curieprocessing>.
      The /lexical space/ of a CURIE is as defined in curie
      <http://www.w3.org/2010/02/rdfa/sources/rdfa-core/Overview-src.html#P_curie>
      below. The /value space/ is the set of IRIs.".
   2. In the 1 location where we reference RFC3986, we change the
      reference to RFC3987: "Since RDFa is ultimately a means for
      transporting RDF, a key concept is the /resource/ and its
      manifestation as a URI. RDF deals with complete URIs (not relative
      paths); when converting RDFa to triples, any relative URIs /must/
      be resolved relative to the base URI, using the algorithm defined
      in section 6.5 of RFC 3987 [RFC3987], /Relative IRI References/."
   3. We add another note in section 2 that says something like "The
      term 'URI' is used throughout this specification.  However, the
      term is used in its generic sense.  The actual value space of URIs
      is that of the set of IRIs as defined in [RFC3987]."  We could
      even include an informatve reference to the XML Schema 1.1 draft
      where anyURI is mapped this way if people think that would help.

In this way we ensure that the only normative reference about IRIs is to 
the IRI spec, but retain the readability and approachability of the 
specification.

I am open to other suggestions, but I think this is the easiest thing to 
implement and the thing that will be most consistent and comprehensible 
for our readers.

-- 
Shane McCarron
Managing Director, Applied Testing and Technology, Inc.
+1 763 786 8160 x120

Received on Thursday, 26 May 2011 03:47:00 UTC