ISSUE 27: Proposal regarding use of relative URIs in the datatype TERMorCURIEorURI from Shane McCarron on 2010-06-29 (public-rdfa-wg@w3.org from June 2010)

From: Shane McCarron <shane@aptest.com>
Date: Tue, 29 Jun 2010 12:13:46 -0500
To: RDFa WG <public-rdfa-wg@w3.org>
Message-ID: <4C2A29CA.3070601@aptest.com>
There has been much discussion about this issue.  The core of the 
argument in favor of relative URIs it is that there is no technical 
reason to prohibit them.  I disagree with this line of reasoning.  
Consider the following:

   1. A TERM is visually indistinguishable from a relative URI.
   2. If relative URIs are permitted, there is NO way to flag an illegal
      token during parsing.  Every string that matches the production(s)
      will result in a triple.
   3. There are no compelling use cases for supporting relative URIs -
      at least not in the places where this datatype is used.
   4. Excluding relative URIs makes explaining how this datatype works
      easier, and therefore can reduce the barrier to adoption.


I know that it is possible to write a parser that will handle relative 
URIs.  I just don't think it is interesting, necessary, nor easy to 
specify.  The following text is a /minor/ modification of what is in the 
spec now - just to make it completely clear that the URI production is 
for an absolute URI.  First, the section defining TERM, section 7.4.3, says:

> Some RDFa attributes have a datatype that permits a term to be 
> referenced. RDFa defines the syntax of a term as:
>
> term     ::=NCName  <http://www.w3.org/TR/2006/REC-xml-names-20060816/#NT-NCName>
>      
>
> When an RDFa attribute permits the use of a term, and the value being 
> evaluated matches the production for term above, it is transformed to 
> a URI using the following logic:
>
>     * If the |term| is in the local term mappings
>       <http://www.w3.org/2010/02/rdfa/sources/rdfa-core/Overview-src.html#dfn-local_term_mappings>,
>       use the associated URI.
>     * Otherwise, if there is a local default vocabulary
>       <http://www.w3.org/2010/02/rdfa/sources/rdfa-core/Overview-src.html#dfn-local_default_vocabulary>
>       the URI is obtained by concatenating that value and the |term|.
>     * Finally, if there is no local default vocabulary
>       <http://www.w3.org/2010/02/rdfa/sources/rdfa-core/Overview-src.html#dfn-local_default_vocabulary>,
>       the |term| has no associated URI and /must/ be ignored.
>

So - any string that matches an NCName production is either a defined 
TERM, an undefined term but in the local default vocabulary, or 
IGNORED.  NCName is an XML Name with no colon.  So this production is 
distinct from the production for a qualified CURIE and for an absolute 
URI.  If we agree on an error reporting mechanism, I would extend this 
to say that the value MUST be ignored and an indication of the error 
placed in the error graph (or whatever the mechanism is).

Next, we have the definition for CURIE syntax in the context of RDFa 
(section 6).  Section 6 says, in part:
>
>     * the *mapping to use with the default prefix* is the current
>       default prefix mapping;
>     * the *mapping to use when there is no prefix* is not defined,
>       which effectively prohibits the use of CURIEs that do not
>       contain a colon (however, see General Use of Terms in Attributes
>       <http://www.w3.org/2010/02/rdfa/sources/rdfa-core/Overview-src.html#s_terms>)
>       ;
>

This means that, when evaluating TERMorCURIEorURI, the parser MUST NOT 
treat any string with no prefix as a CURIE.  So 'foo' is NEVER a CURIE.  
':foo' can be treated as a CURIE if there is a default prefix.

Finally, we have what happens if the string is neither a TERM nor a 
CURIE.  We say in several places in the spec that in this case the 
string is treated as a URI.  My proposal is that we change these 
instances to read 'processed as an absolute URI as defined in [URI] 
section 4.3'.  So, for example, section 7.4, which currently reads:
>
> TERMorCURIEorURI
>
>         * If the value is an NCName
>           <http://www.w3.org/TR/2006/REC-xml-names-20060816/#NT-NCName>,
>           then it is evaluated as a term according to General Use of
>           Terms in Attributes
>           <http://www.w3.org/2010/02/rdfa/sources/rdfa-core/Overview-src.html#s_terms>.
>           Note that this step may mean that the value is to be ignored.
>         * Otherwise, the value is evaluated as a CURIE. If it is a
>           valid CURIE, the resulting URI is used; otherwise, the value
>           will be processed as a URI.
>

would be changed to read:

TERMorCURIEorURI

        * If the value is an NCName
          <http://www.w3.org/TR/2006/REC-xml-names-20060816/#NT-NCName>,
          then it is evaluated as a term according to General Use of
          Terms in Attributes
          <http://www.w3.org/2010/02/rdfa/sources/rdfa-core/Overview-src.html#s_terms>.
          Note that this step may mean that the value is to be ignored.
        * Otherwise, if it is a valid CURIE, the resulting URI is used;
        * Otherwise, if it matches absolute-URI as defined in [URI]
          section 4.3, the value is treated as a URI.

If we define error handling, I would extend this to read 'If none of 
these conditions are met, the value is added to the error graph (or 
whatever).'

In other words, if a value doesn't have a prefix and isn't a TERM, it is 
an error.

-- 
Shane P. McCarron                          Phone: +1 763 786-8160 x120
Managing Director                            Fax: +1 763 786-8180
ApTest Minnesota                            Inet: shane@aptest.com
Received on Tuesday, 29 June 2010 17:14:21 UTC