Finding News Taxonomies [was: RE: Towards a TAG consideration of CURIEs]

John Cowan wrote:

> Misha Wolf scripsit:
> 
> > As we would very strongly prefer to end up with a Web page per 
> > Taxonomy, 
> 
> Is that really sensible when taxonomies are very large?  Consider
> SNOMED-CT, with upwards of 300,000 terms.  I should think that
> the choice of / vs. # should be allowed to depend on the taxonomy
> in use.

Well, there are two options for URI construction:

a) use simple concatenation of taxonomy URI and code,

b) require that a specified string be injected between the taxonomy 
   URI and the code.

I agree with David Booth that consuming programs shouldn't have to
contain hardwired knowledge of the rules for each taxonomy.  I'm not 
sure, though, that there exists a viable mechanism for telling a 
program which of the above to do, for each of the hundreds of 
taxonomies used for News.  I haven't looked at GRDDL for some time, 
but I seem to recall that it is designed for interpreting document 
instances, so is probably not the right tool for specifying how to 
handle a taxonomy that will be used by millions of documents.  I 
also don't recall such a capability in RDDL, though I haven't looked 
at it, too, for quite some time.

So if we limited ourselves to one rule only, and if we wanted to 
support the use of both "#" and "/", we would probbaly have to go 
for simple concatenation and specify that in cases where any of the 
codes would not be legal fragment IDs, the taxonomy URI must end 
with a character which will sanitise the code.  This approach is 
illustrated by choices 1 and 2 in my previous mail:

1. Simple concatenation using "/" as the delimiter
   "http://www.iptc.org/NewsCodes/" & "123456" ->
   "http://www.iptc.org/NewsCodes/123456"

2. Simple concatenation using "#_" as the delimiter
   "http://www.iptc.org/NewsCodes#_" & "123456" ->
   "http://www.iptc.org/NewsCodes#_123456"

One of the disadvantages is that a number of RDF tools can't cope 
with choice 2.  At any rate, this seemed to be the case when I last 
looked into this matter.

Misha Wolf
News Standards Manager, Reuters, http://www.reuters.com/
Vice Chair, News Architecture WP, IPTC, http://www.iptc.org/

This email was sent to you by Reuters, the global news and information company. 
To find out more about Reuters visit www.about.reuters.com

Any views expressed in this message are those of the individual sender, 
except where the sender specifically states them to be the views of Reuters Limited.

Reuters Limited is part of the Reuters Group of companies, of which Reuters Group PLC is the ultimate parent company.
Reuters Group PLC - Registered office address: The Reuters Building, South Colonnade, Canary Wharf, London E14 5EP, United Kingdom
Registered No: 3296375
Registered in England and Wales

Received on Saturday, 7 April 2007 15:17:40 UTC