RE: CURIEs: A proposal from Dan Connolly on 2006-06-26 (semantic-web@w3.org from June 2006)

From: Dan Connolly <connolly@w3.org>
Date: Mon, 26 Jun 2006 09:44:16 -0500
To: Misha Wolf <Misha.Wolf@reuters.com>
Cc: public-rdf-in-xhtml-tf@w3.org, semantic-web@w3.org, www-tag@w3.org
Message-Id: <1151333057.16839.422.camel@dirk.w3.org>
On Sat, 2006-06-24 at 00:07 +0100, Misha Wolf wrote:
> Survey updated in the light of Shane McCarron's very helpful mail ...
> 
> The data points presented reflect my incomplete understanding.
> Please help by correcting the ones that are wrong and providing the 
> missing (or additional) data points.
> 
> For context, see:
> -  RDFa Primer 1.0, 16 May 2006
>    http://www.w3.org/TR/2006/WD-xhtml-rdfa-primer-20060516/
> -  News Taxonomies presentation to the W3C AC, given on 22 May 2006
>    http://lists.w3.org/Archives/Public/www-archive/2006Jun/0013.html
> -  CURIEs: A proposal, 2 June 2006
>    http://lists.w3.org/Archives/Public/www-tag/2006Jun/0007.html
> 
> 7   Each language must specify:

It's already the case that we have a generic URI syntax specification,
which includes one abbreviation mechanism (URI references), and
every format/syntax can use URI references and/or other
URI abbreviation mechanisms.

It's not clear to me what CURIEs are, beyond that.

If there is to be a CURIE specification, I would expect one
could write software and test cases... but the CURIE mechanism
seems to be parameterized by arbitrary natural language prose.
How would one test a CURIE implementation?

Earlier in this thread, I tried to give input to a NewsML2
abbreviation mechanism that would share quite a bit with
SPARQL, in order to minimize surprises and to facilitate
copy/past of data.

But now the thread has gone back to a cross-language mechanism.
I'm not sure where we're headed, but I can answer
some of these fact-finding questions, I suppose...

> 7a  the syntactic constraints (if any) on the prefix and suffix.
> 
>     XMLNS    : prefix/suffix = NCNAME; prefix can be omitted
>     XHTML    : prefix = NCNAME; suffix = IRI; prefix can be omitted
>     NewsML 2 : prefix/suffix = NCNameChar+
>     SPARQL   : ?

Item 1 of the list from which this 7a comes is
"1   We agree on a generic syntax and generic rules for Compact URIs 
    (CURIEs) in attribute values."

SPARQL isn't carried in XML, so I'm not sure how the survey applies.

In SPARQL, names of the form _:xyz are used for something other
than URI abbreviations. The grammar seems to prohibit prefixes
from starting with "_". And in both the prefix and the suffix,
starting with a "." seems to be prohibited.
For details, see http://www.w3.org/TR/rdf-sparql-query/#rNCNAME_PREFIX

>     RDF/XML  : ?

RDF/XML doesn't have qnames in content. For elements/attributes,
it follows XML.

>     N3       : ?
>     Turtle   : ?

turtle seems to disallow "." in names altogether. And I don't
see "_" there either.
http://www.dajobe.org/2004/01/turtle/#nameChar


The lexical details of N3 are a work in progress; the
grammar that TimBL maintains says...


qname
        
        (([a-zA-Z_][a-zA-Z0-9_]*)?:)?[a-zA-Z_][a-zA-Z0-9_]*

  -- http://www.w3.org/2000/10/swap/grammar/n3-report.html

but I think the code and tests differ. We seem to have
a test for a non-ascii character in a name, but I can't
find it in the log of the tests run for the latest release,
so I'm not sure what it's status is.


> 7b  how CURIEs and URIs are distinguished, eg through dedicated 
>     attributes or through a special syntax.
> 
>     XMLNS    : ?
>     XHTML    : Mix of dedicated attributes and special syntax 
>                ("[a:b]") for non-dedicated attributes
>     NewsML 2 : Dedicated attributes
>     SPARQL   : ?
>     RDF/XML  : ?
>     N3       : ?
>     Turtle   : ?

Again, RDF/XML has only URI references in content.

In SPARQL, N3, and turtle, URI references are written <thusly>,
and qnames can't have <>s in them.

> 7c  the mechanism for specifying the prefix-to-IRI mapping.  The 
>     mechanism may use information provided out-of-band.
> 
>     XMLNS    : xmlns attribute
>     XHTML    : xmlns attribute
>     NewsML 2 : scheme element
>     SPARQL   : ?
>     RDF/XML  : ?
>     N3       : ?
>     Turtle   : ?

RDF/XML follows XML namespaces.

In turtle and N3, prefixes are declared a la
  @prefix abc: <http://example/abc#>.

and in SPARQL it's almost the same except for the
leading @ and trailing '.', and that the case of
the keyword doesn't matter:

  PREFIX abc: <http://example/abc#>


> 7d  whether and, if so, how the prefix and suffix are combined to 
>     form an IRI.
> 
>     XMLNS    : Left to each language to specify
>     XHTML    : Plain concatenation, including cases such as:
>                prefixIRI = http://www.example.com/partial_
>                suffix    = folder/item
>                fullIRI   = http://www.example.com/partial_folder/item
>     NewsML 2 : For vocabularies managed by the IPTC, we're 
>                considering:
>                  <vocabIRI> & "#_" & <code>
>                or plain concatenation, coupled with vocabIRIs ending 
>                with "?" or "/" or "#_" 
>                For vocabularies not managed by the IPTC:
>                  Left to each vocabulary authority to specify
>     SPARQL   : ?
>     RDF/XML  : ?
>     N3       : ?
>     Turtle   : ?

RDF/XML uses plain concatenation, as do SPARQL, turtle, and N3.

> 7e  whether the prefix and suffix form a tuple or whether they are 
>     a compact representation for an IRI.

'whether'? They always form a tuple; that's sort of self-evident, no?
I guess the question is whether the tuple is also specified to be
an IRI abbreviation.

>     XMLNS    : Tuple
>     XHTML    : Compact representation for an IRI
>     NewsML 2 : Tuple and compact representation for an IRI
>     SPARQL   : ?
>     RDF/XML  : ?
>     N3       : ?
>     Turtle   : ?

In RDF/XML, the tuple from XMLNS is used as an IRI abbreviation.
There are lots of cases where it's useful to round-trip back
to the tuple, though that's never strictly required.

Same for SPARQL, turtle, and N3.

> 7f  whether the dereferencing of the IRI mapped to the prefix is 
>     required to yield a useful and relevant information resource.
> 
>     XMLNS    : Not required, but the Architecture of the WWW states:
>                "The owner of an XML namespace name SHOULD make 
>                available material intended for people to read and 
>                material optimized for software agents in order to 
>                meet the needs of those who will use the namespace 
>                vocabulary."
>     XHTML    : Not required; note that the prefix may correspond to a
>                partial, nonsensical IRI, without the suffix (see 7d)
>     NewsML 2 : Required
>     SPARQL   : ?
>     RDF/XML  : ?
>     N3       : ?
>     Turtle   : ?

The URI mapped to the prefix isn't relevant at the RDF abstract
syntax level. It's handy to be able to look it up for RDF/XML
consumers that only know they're looking at XML.

> 7g  whether the dereferencing of the IRI built from the prefix and 
>     suffix (and, possibly, also other building blocks) is required 
>     to yield a useful and relevant information resource.
> 
>     XMLNS    : Left to each language to specify
>     XHTML    : Yes

Really? Which part of the XHTML spec requires that? Surely
it's no more required to be available than the target of an href="..."?

>     NewsML 2 : For vocabularies managed by the IPTC: MUST
>                For vocabularies not managed by the IPTC: SHOULD
>     SPARQL   : ?
>     RDF/XML  : ?
>     N3       : ?
>     Turtle   : ?

It's a best practice in RDF (i.e. RDF/XML and SPARQL and N3 and turtle).

> 7h  whether any fragment identifiers in these IRIs are required to 
>     be legal XML names.
> 
>     XMLNS    : Outside the scope of the spec
>     XHTML    : Outside the scope of the spec
>     NewsML 2 : Yes
>     SPARQL   : ?
>     RDF/XML  : ?
>     N3       : ?
>     Turtle   : ?

RDF doesn't put any constraints on fragments.

> Misha

-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E
Received on Monday, 26 June 2006 14:44:40 UTC