RE: CURIEs: A proposal from Misha Wolf on 2006-06-13 (www-tag@w3.org from June 2006)

From: Misha Wolf <Misha.Wolf@reuters.com>
Date: Tue, 13 Jun 2006 17:01:19 +0100
To: Dan Connolly <connolly@w3.org>, www-tag@w3.org
Cc: newsml-2@yahoogroups.com, public-rdf-in-xhtml-tf@w3.org
Message-id: <A29ADE959C70A1449470AA9A212F5D800203FE13@LONSMSXM06.emea.ime.reuters.com>
Hi Dan,

Various groups are interested in the CURIE initiative. These groups
don't all have the same requirements. I hope that we can agree on a
good solution, which meets all of the requirements. During our
post-presentation discussion in Edinburgh, we discussed specifically
the IPTC's requirements. The mail you responded to below was my
attempted synthesis, which tackles a broader canvas than just the
IPTC's needs. I've summarised some of the options for the IPTC's
problem space, and some of the problems with those options, in my
reply to Henry:
http://lists.w3.org/Archives/Public/www-tag/2006Jun/0046.html

I'll respond now to specific points in your mail:

> As I understand it, IPTC has a whole bunch of codes... collections
> of codes, in fact. Vocabularies, I gather.

Indeed.  Note that many of these vocabularies exist independently of 
the IPTC, eg:
-  BCP-47 (eg "zh-Hant", ie Traditional Chinese)
-  CUSIP (eg "037833100", ie Apple Computer)
-  ISBN (eg "0-321-18578-1", ie The Unicode Standard, Version 4.0)
-  ISIN (eg "US0378331005", ie Apple Computer)
-  ISO-3166-Alpha-2 (eg "CS", ie Serbia and Montenegro)
-  ISO-4217-Alpha (eg "JPY", ie Japanese Yen)
-  ISO-4217-Num (eg "392", ie Japanese Yen)
-  ISSN (eg "0261-3077", ie The Guardian)
-  NYSE (eg "A", ie Agilent Technologies)
-  SEDOL (eg "0263494", ie BAE Systems)
-  Valoren (eg "1203203", ie UBS)

> The goal is a compact syntax to encode a code within a vocabulary,
> such that you can get from this compact syntax a URI for the code
> within the vocabulary and for the vocabulary itself.

And to ensure that receiving systems and people receive codes which 
they understand.

> Some of the codes start with digits. We suspect (though we're not
> certain) that vocabularies are homogeneous in this respect: within
> a vocabulary, either all the codes start with a digit or none do.
> 
> I gather these are for use in NewsML2, and there's a desire
> to share technology between NewsML2 and XHTML2 and other
> formats and to use the URIs with RDF tools.
> 
> We discussed a number of possibilities... for the sake of
> example, a numeric code set I know about (though I'm not at all
> sure it's actually used in IPTC...) is SIC codes
> (http://en.wikipedia.org/wiki/SIC_codes ) and a non-numeric
> code set that I know about is IATA codes
> (http://en.wikipedia.org/wiki/IATA_airport_code ).
> 
> Option A. Have a syntax for binding, say, sic: to
http://sic.org/vocab1#
> and use sic:0070 for a code. To get a URI for that code, concatenate 
> them.
> http://sic.org/vocab1#0070 . To get a URI for the vocabulary, 
> concatenate
> them and then strip off the fragment: http://sic.org/vocab1 .
> Similarly, bind, say, iata: to http://iata.org/airports# and
> let iata:LGA expand to http://iata.org/airports#LGA and
> then to get the vocabulary, strip off the fragment 
> http://iata.org/airports.
> 
> The sic:0070 short-hand does not match XML/XPath QName syntax,
> so you can't use it in RDF/XML. You can't even make up a QName
> for the URI http://sic.org/vocab1#0070 so you simply can't use
> it as a property name in RDF/XML. (The example of a SIC code
> is not something that you're likely to want to use as an RDF
> property name
> 
> Option B: Bind sic: to http://sic.org/vocab1 and use sic:0070;
> To get a URI for that code, concatenate them with a # between:
> http://sic.org/vocab1#0070 . To get a URI for the vocabularly,
> look in the binding, and get http://sic.org/vocab1 .
> 
> Option C: Like A, but for any codes that don't start with an XML name
> start character, put a _ in front of it before you use it in any of 
> these
> web technolgies. So sic:_0070 is the short syntax, 
> http://sic.org/vocab1#_0070
> is the URI for the code, and again, to get the URI for the vocab,
> strip off the fragment: http://sic.org/vocab1 .
> Now we can use the short syntax as a QName in RDF/XML.

We can't do this as receiving systems (and people) would not
recognise the codes.

> In Option C, the IATA stuff is the same as in Option A:
> bind iata: to http://iata.org/airports# and
> let iata:LGA expand to http://iata.org/airports#LGA
> and strip off the fragment to get the vocabulary and get
> http://iata.org/airports .
> 
> 
> There might have been some other options that I've forgotten.

The option I favour is:
  vocabIRI          = http://sic.org/vocab1
  prefix            = sic
  suffix (aka code) = 0070
  CURIE             = sic:0070
  construction rule = <vocabIRI> & "#_" & <code>
  codeIRI           = http://sic.org/vocab1#_0070

> And I'm not sure to what extent compatibility with existing NewsML
> practice is a requirement.

It isn't.  But compatibility with the real world *is* a requirement.

> The proposal you make here seems much more complicated
> than any of those options, and it involves a lot more coordination
> (new rules that bindin on "Groups within the W3C and elsewhere").

See my intro.

Regards,
Misha
------------------- NewsML 2 resources ------------------------------
http://www.iptc.org         | http://www.iptc.org/std-dev/NAR/1.0
http://www.iptc.org/std-dev | http://groups.yahoo.com/group/newsml-2


To find out more about Reuters visit www.about.reuters.com

Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.
Received on Tuesday, 13 June 2006 16:01:51 UTC