Re: CURIEs: A proposal from Dan Connolly on 2006-06-13 (www-tag@w3.org from June 2006)

From: Dan Connolly <connolly@w3.org>
Date: Tue, 13 Jun 2006 10:35:23 -0400
To: Misha Wolf <Misha.Wolf@reuters.com>
Cc: www-tag@w3.org, newsml-2@yahoogroups.com, public-rdf-in-xhtml-tf@w3.org
Message-Id: <0b906d88a0611be05152a8987c7d29a7@w3.org>
On Jun 2, 2006, at 2:14 PM, Misha Wolf wrote:
> Hi folks,
>
> A modest proposal, drawing on ideas from Mark, Henry, Tim, Dan, Norm
> and others:

I found the notes from our discussion in Edinburgh, Misha, but then I 
left
them at home and I'm travelling. I got a better picture of the 
requirements,
and we discussed several options.

As I understand it, IPTC has a whole bunch of codes... collections
of codes, in fact. Vocabularies, I gather.

The goal is a compact syntax to encode a code within a vocabulary,
such that you can get from this compact syntax a URI for the code
within the vocabulary and for the vocabulary itself.

Some of the codes start with digits. We suspect (though we're not
certain) that vocabularies are homogeneous in this respect: within
a vocabulary, either all the codes start with a digit or none do.

I gather these are for use in NewsML2, and there's a desire
to share technology between NewsML2 and XHTML2 and other
formats and to use the URIs with RDF tools.

We discussed a number of possibilities... for the sake of
example, a numeric code set I know about (though I'm not at all
sure it's actually used in IPTC...) is SIC codes
(http://en.wikipedia.org/wiki/SIC_codes ) and a non-numeric
code set that I know about is IATA codes
(http://en.wikipedia.org/wiki/IATA_airport_code ).

Option A. Have a syntax for binding, say, sic: to http://sic.org/vocab1#
and use sic:0070 for a code. To get a URI for that code, concatenate 
them.
http://sic.org/vocab1#0070 . To get a URI for the vocabulary, 
concatenate
them and then strip off the fragment: http://sic.org/vocab1 .
Similarly, bind, say, iata: to http://iata.org/airports# and
let iata:LGA expand to http://iata.org/airports#LGA and
then to get the vocabulary, strip off the fragment 
http://iata.org/airports.

The sic:0070 short-hand does not match XML/XPath QName syntax,
so you can't use it in RDF/XML. You can't even make up a QName
for the URI http://sic.org/vocab1#0070 so you simply can't use
it as a property name in RDF/XML. (The example of a SIC code
is not something that you're likely to want to use as an RDF
property name


Option B: Bind sic: to http://sic.org/vocab1 and use sic:0070;
To get a URI for that code, concatenate them with a # between:
http://sic.org/vocab1#0070 . To get a URI for the vocabularly,
look in the binding, and get http://sic.org/vocab1 .

Option C: Like A, but for any codes that don't start with an XML name
start character, put a _ in front of it before you use it in any of 
these
web technolgies. So sic:_0070 is the short syntax, 
http://sic.org/vocab1#_0070
is the URI for the code, and again, to get the URI for the vocab,
strip off the fragment: http://sic.org/vocab1 .
Now we can use the short syntax as a QName in RDF/XML.

In Option C, the IATA stuff is the same as in Option A:
bind iata: to http://iata.org/airports# and
let iata:LGA expand to http://iata.org/airports#LGA
and strip off the fragment to get the vocabulary and get
http://iata.org/airports .


There might have been some other options that I've forgotten.

And I'm not sure to what extent compatibility with existing NewsML
practice is a requirement.

The proposal you make here seems much more complicated
than any of those options, and it involves a lot more coordination
(new rules that bindin on "Groups within the W3C and elsewhere").

> 1   We agree on a generic syntax and generic rules for Compact URIs
>     (CURIEs) in attribute values.
 >
> 2   We agree that restricted syntaxes and rules will be (or have
>     been) defined for specific purposes.   One such purpose is XML
>     Namespaces and QNames.
>
> 3   Groups within the W3C and elsewhere will define other restricted
>     syntaxes and rules for their own purposes.
>
> 4   The generic syntax for a CURIE in an attribute value will be:
>        <foo bar="prefix:suffix"/>
>
> 5   The generic syntax for multiple CURIEs in an attribute value
>     will (where permitted) be:
>        <foo bar="prefix1:suffix1 ... prefixN:suffixN"/>
>
> 6   Both the prefix and the suffix may (in the generic case) be
>     numeric.
>
> 7   Each language must specify:
>
> 7a  the syntactic constraints (if any) on the prefix and suffix.
>
> 7b  how CURIEs and URIs are distinguished, eg through dedicated
>     attributes or through a special syntax.
>
> 7c  the mechanism for specifying the prefix-to-IRI mapping.  The
>     mechanism may use information provided out-of-band.
>
> 7d  whether and, if so, how the prefix and suffix are combined to
>     form an IRI.
>
> 7e  whether the prefix and suffix form a tuple or whether they are
>     just a compact representation for an IRI.
>
> 7f  whether the IRI mapped to the prefix is required to be
>     dereferenceable.
>
> 7g  whether the IRI built from the prefix and suffix (and, possibly,
>     including also other building blocks) is required to be
>     dereferenceable.
>
> 7h  whether any fragment identifiers in these IRIs are required to
>     be legal XML names.
>
> 8   To avoid confusion with XML Namespaces and QNames:
>
> 8a  The xmlns attribute is reserved for use with XML Namespaces and
>     QNames.
>
> 8b  If a prefix matches an xmlns declaration then the CURIE MUST be
>     interpreted as a QName.
>
> Misha
> ------------------- NewsML 2 resources ------------------------------
> http://www.iptc.org         | http://www.iptc.org/std-dev/NAR/1.0
> http://www.iptc.org/std-dev | http://groups.yahoo.com/group/newsml-2



-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Tuesday, 13 June 2006 14:35:42 UTC