RE: Showing the Semantic Web

Hi Frank,

> Your message clarified (I think!) some things for me about the 
> issues you raised.
> 
> First, some general comments.  It seems to me what you're looking 
> for is basically some "schema design" or data organization 
> discussion here, i.e., how to handle application-related 
> situations X and Y.

The NewsML 2 Architecture WP has by now defined most of the NewsML 2 
syntax.  Looking at areas relevant to this list, we've decided:

1  to use URIs for just about everything
2  to use CURIEs rather than XML Namespaces
3  the names, models and semantics of our elements and attributes

Point 2 above goes against W3C tradition, but we (and others) have 
found this tradition to be an inadequate foundation on which to 
build.

In point 3 we've gone against DC tradition, as we've decided to 
create our own elements even where similar elements exist in the 
various DC namespaces.  We've done this because we've found the DCMI 
insufficiently responsive when we've pointed out problems with their 
definitions (we have a major problem with dc:title).

Both of the above decisions are in harmony with the answer I have
been getting on this list to most questions.  This answer can be 
paraphrased as "use late binding" or "use indirection".  By this I 
mean, do whatever, and then use some semantic glue which maps 
whatever to something else.

The problem is that the help we've been asking for is in precisely 
this area: how to map our whatever to something else.  We want to 
use something like GRDDL to transform our syntax to something 
interoperable and we can't find the interoperable stuff that our 
XSLT transforms should be producing :-(

> > There are also a number of more detailed issues on which we've 
> > got no help at all.  I don't recall on which list we aired 
> > them, so it may not have been here.  These include:
> > 
> > -  The inability of various RDF-related formats etc to deal 
> >    with numeric codes.
>  
> I don't recall seeing this issue.  Is this a reference to the 
> issue described in Section 4.3 in the NewsML Architecture 
> document, e.g., the example from the CURIE document of wanting to 
> use something like iptc:10112244 (your description here doesn't 
> make that entirely clear)?

Yes.  In the real world, there are a number of vast taxonomies 
utilising numeric codes.  Examples given in the NewsML 2 Technical 
Specification [1] include:

-  CUSIP (eg "037833100", ie Apple Computer)
-  ISBN (eg "0-321-18578-1", ie The Unicode Standard, Version 4.0)
-  ISSN (eg "0261-3077", ie The Guardian)
-  SEDOL (eg "0263494", ie BAE Systems)
-  Valoren (eg "1203203", ie UBS)

The IPTC Subject NewsCodes [2] too are numeric (eg "15095000", ie 
Naginata).

Note: Naginata is a Japanese traditional martial art using a pole 
sword made of wood. "Naginata" means a spear with a curved blade.

> If it is, certainly CURIEs would address this issue, as you note 
> below.  However, wouldn't using XML entities also work to help 
> abbreviate the URIs?

We considered this option and decided not to go there :-)

> Here, it's necessary to very precisely read your reference to 
> "RDF-related formats", since the problem isn't with the RDF model 
> per se, but rather the use of various notations (RDF/XML in this 
> case) for encoding it.

Indeed.  The problem is also with the various W3C WGs who don't 
consider this issue to be one that is worth solving.

> Alternatively, have you looked at PRISM's approach to dealing 
> with controlled vocabularies?

Not personally, though other members of the NewsML 2 Architecture 
WP are familiar with PRISM.  Are you referring to a specific 
solution to this problem?

> > The problem of how to reconcile having 20-30 taxonomies in a 
> > document with keeping the document reasonably small.  We have 
> > asked about alternative mechanisms for declaring alias/URI 
> > correspondence, but all we have got back is: Use XML Namespaces.
> > This is despite the fact that we are not declaring namespaces 
> > for elements/attributes etc, and so do not need to be bound by 
> > the contraints specified for those.
> 
> Again, I don't recall seeing this issue, and I'm not exactly sure 
> what you have in mind here.  Can you give an example or point to 
> something in the NewsML specs that illustrates it?

Take a look at any of the examples in:
   http://www.iptc.org/NAR/1.0/examples/
or in:
   http://www.iptc.org/NAR/1.0/examples/tutorials/

These have on average 8-10 scheme declarations, but real-life cases 
are likely to have many more, as we're using URIs for everything.

> > The RDF-in-XHTML task force is well on the path to specifying 
> > CURIEs, which will address the first of these two concerns, but 
> > not the second.  Consequently, we are having to invent our own 
> > declaration mechanism which is regrettable.
> 
> I can certainly see how this seems regrettable from your point of 
> view, but after all *someone* has to invent the declaration 
> mechanism don't they?

True.

> And hopefully they would do so on the basis of detailed 
> application requirements, such as those you are bringing to the 
> table, rather than "out of thin air".  I hope and believe that 
> this sort of feedback will help improve the whole Semantic Web 
> infrastructure, so from that point of view this isn't at all 
> regrettable, however unfortunate it may seem to you now.

So maybe it's true that every cloud has a silver lining :-)

[1] http://www.iptc.org/NAR/1.0/specification/
[2]
http://www.iptc.org/NewsCodes/nc_ts-table01.php?TsByName=iptc-subjectcod
e

Thanks,
Misha


To find out more about Reuters visit www.about.reuters.com

Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.

Received on Wednesday, 22 February 2006 21:07:34 UTC