RE: CURIEs: A proposal from Misha Wolf on 2006-06-13 (www-tag@w3.org from June 2006)

From: Misha Wolf <Misha.Wolf@reuters.com>
Date: Tue, 13 Jun 2006 23:01:44 +0100
To: Dan Connolly <connolly@w3.org>, www-tag@w3.org
Cc: newsml-2@yahoogroups.com, public-rdf-in-xhtml-tf@w3.org
Message-id: <A29ADE959C70A1449470AA9A212F5D800203FEC8@LONSMSXM06.emea.ime.reuters.com>
Hi Dan,

> I'm a little in the dark... if I knew what the other groups are 
> involved and what their requirements are, I'd be in a better 
> position to evaluate the proposal... and in a better position to 
> know if a critical mass of the relevant constituents agree.

First of all there is the RDF-in-XHTML task force.  And then 
there are the various W3C specifications tracked down by Mark, 
which variously claim to be using QNames but aren't, or are 
explicitly using assorted quasi-QNames.  Details of a number of 
these were given by Mark in his presentation at Edinburgh and in 
earlier mails.

As far as proposals are concerned, I distinguish between the 
specific proposal for solving the IPTC's problem, and a more 
general proposal for an architecture which encompasses all of 
these abbreviated forms, including QNames (given in my mail at 
the start of this thread).

> > And to ensure that receiving systems and people receive codes 
> > which they understand.

> Can we expect receving systems/people to learn about whatever
> proposal we come up with? Or do they have to understand based
> on what they already know, without any code changes for systems
> and without people reading more specs (or other docs)?
> 
> If we can't expect systems to pick up new technology, that's
> sort of a non-starter, no?

Lots of systems insert/store/display naked codes directly.  As we 
are trying to make NewsML 2 easy to use, we aren't going to be 
telling people that they have to mangle their data because some 
W3C spec can't cope with it the way it is.  We are very happy to 
generate mangled URIs for (X)HTML/RDF pages documenting the 
vocabularies, but we won't mangle the base data.  In another mail 
I gave the example of IRIs vs URIs.  It has been accepted that 
users need to be able to include resource identifiers, as they 
understand them, in XML documents.  The mangling to URIs happens 
behind the scenes.

> Receiving systems can execute the algorithm and determine
> the relevant URIs and then look up the URIs in the Web, no?

Indeed.

> > The option I favour is:
> >   vocabIRI          = http://sic.org/vocab1
> >   prefix            = sic
> >   suffix (aka code) = 0070
> >   CURIE             = sic:0070
> >   construction rule = <vocabIRI> & "#_" & <code>
> >   codeIRI           = http://sic.org/vocab1#_0070

> That looks reasonable as far as I can tell.

> > [...] compatibility with the real world *is* a requirement.

> I don't know what to make of that remark.

Simply a restatement of the position that the codes themselves 
must be left as is.  To see why, try any of these in Google:

   CUSIP 037833100    -> Apple Computer

   SEDOL 0263494      -> BAE Systems

   Valoren 1203203    -> UBS

   ISBN 0-321-18578-1 -> The Unicode Standard

   ISSN 0261-3077     -> The Guardian

   ISO 4217 392       -> Japanese Yen

Then try them again, this time prefixing the numeric value with 
a "_".  The result is, in each case except for the last one:

   Your search - [...] - did not match any documents. 

   Suggestions:
      Make sure all words are spelled correctly.
      Try different keywords.
      Try more general keywords.
      Try fewer keywords.

In the last case, three hits are shown for the string:
   ISO 4217 _392
but they are all irrelevant.

Regards,
Misha
------------------- NewsML 2 resources ------------------------------
http://www.iptc.org         | http://www.iptc.org/std-dev/NAR/1.0
http://www.iptc.org/std-dev | http://groups.yahoo.com/group/newsml-2


To find out more about Reuters visit www.about.reuters.com

Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.
Received on Tuesday, 13 June 2006 22:02:12 UTC