Re: Showing the Semantic Web from Frank Manola on 2006-02-22 (semantic-web@w3.org from February 2006)

From: Frank Manola <fmanola@acm.org>
Date: Wed, 22 Feb 2006 10:37:29 -0500
To: Misha Wolf <Misha.Wolf@reuters.com>
CC: Semantic Web <semantic-web@w3.org>, newsml-2@yahoogroups.com
Message-ID: <43FC8539.5010806@acm.org>
Misha Wolf wrote:
> Hi Frank,
> 
> Thanks for your long and thoughtful mail of 18 Jan 2006 and 
> apologies for the delay in my reply, caused by the usual syndrome of 
> too much to do and not enough hours in the day (or night).  
> Yesterday I heard on the radio about some new pills which cut down 
> the amount of sleep one requires.  Maybe I need to get a pack ...

Mischa--

Thanks yourself.  And I'm sorry not to have responded sooner myself.  As 
usual, other stuff intervened, and I needed to do some additional 
research anyway.  I don't think the pills you mention would be a good 
idea in any case:  if you used them, you probably would then need to 
take other pills to go to sleep!

Your message clarified (I think!) some things for me about the issues 
you raised.

First, some general comments.  It seems to me what you're looking for is 
basically some "schema design" or data organization discussion here, 
i.e., how to handle application-related situations X and Y.  The SWBP 
group is doing some of that, but their stuff is pretty generic.  I agree 
that this would be one forum where such discussions could take place. 
But the details of the specific problems need to be laid out.

You note that no one but the news industry has apparently embraced the 
Semantic Web to this extent.  Unfortunately, the result of pioneering is 
sometimes that you run into uncharted territory, even for dealing with 
what appear to be simple and generic things.  I think the situation is 
probably that different people have worked out ways to do some of these 
things.  What may well be the case, though, is that there aren't 
fully-worked-out "best practices" or general agreements (let alone 
"standards") about how to do some of the things you need to do.  That 
means there will be many opinions and alternatives.  That may ultimately 
be the case anyway:  we're essentially talking "information engineering" 
here, and what is a good solution given one set of constraints may be 
less than ideal in a different situation.

I have to say this seems natural.  The situation with respect to the 
Semantic Web can be compared, it seems to me, with the situation that 
used to exist with respect to databases.  The RDF and OWL specs are 
more-or-less comparable to the definition of the relational model for 
databases.  But the definition of the relational model itself didn't 
provide much guidance as to how to design relational databases to handle 
specific data modeling requirements.  A lot of additional work, some of 
it generic (e.g., working out the various normal forms), and some of it 
specific to certain applications, was necessary to work those additional 
details out.  But even now (unfortunately in my opinion), it's hard to 
find repositories of good database design practice for various 
situations, except for the most obvious ones.  The same situation is 
true for the Semantic Web (with the Web hopefully making it easier to 
share design experience than was true decades ago).  While it may be 
unfortunate from the point of view of someone like yourself who has to 
get a design task accomplished, I wouldn't have expected all the variety 
of requirements you're faced with to have been worked out ahead of time 
(and certainly not by the W3C!).

Now to more specific comments:

> 
> I'm responding here to two of your points.
> 
> 
>>Why don't you explain a little more (in particular, say why you 
>>think NewsML is the most important application to enter the 
>>Semantic Web arena), if you think it's that worthwhile as a matter 
>>of general interest?
> 
> 
> The News Industry (ie the News Agencies, represented by the 
> International Press Telecommunications Council) has decided to 
> develop a standard for B2B News exchange which is designed to be 
> compatible with the Semantic Web.  I know of no other industry whose 
> entire output will form part of the Semantic Web.  Do you?

No, and you raise a good point here.  However, I don't recall this point 
having been made this clearly before, which may affect how much 
attention it got/gets.

> 
> 
>>First, I've reviewed some of your past email on NewsML.  It seems 
>>to me that, to the extent you explicitly asked for help, you got 
>>it (I recall a reification question;  were there others?).
> 
> 
> This is absolutely correct.  My concern is that something as 
> fundamental as how to express:
> 
> -  who said that the subject/genre/creator/etc of the story/picture/
>    /video/etc is foo?
> 
> -  when did they say it?
> 
> -  with what confidence did they say it?
> 
> -  etc
> 
> seems to still be at the level of "Well, you could do it like this, 
> or you could do it like that".
> 
> Forgive me for being naive, but I do not comprehend how it will be 
> possible to formulate successful queries along the lines of:
> 
>    Find me all stories about which X says with more than 70% 
>    confidence that they have subject Y.
> 
> if there isn't an agreed way of making these assertions, supported 
> by an adequate range of off-the-shelf tools.

I think I need some additional information (or clarification) here.  It 
seems to me that when you have a specific way of recording information 
about such things as "regarding story FOO, X says with more than 70% 
confidence, that it has subject Y", that you can certainly formulate 
queries on such NewsML information (or RDF derived from it), and you 
could use, e.g., SPARQL tools to do it.   It's as if you'd designed a 
particular database schema to represent that information, and given that 
you have such a schema, you can query a database designed in accordance 
with that schema.

But it seems to me that you're asking for more than this, namely that 
everyone in the Semantic Web should agree on how to represent such 
assertions, not just that NewsML needs to have a way of doing it.  While 
it would certainly be a Good Thing to have that global agreement, 
because then everyone could query everyone's assertions about stories 
and subjects the same way whether they use NewsML or not, I don't think 
you're going to have that level of agreement "out of the box" so to 
speak.  But surely you can formulate successful queries on NewsML 
material once you've decided how to represent that kind of information 
can't you?

> 
> There are also a number of more detailed issues on which we've got no 
> help at all.  I don't recall on which list we aired them, so it may 
> not have been here.  These include:
> 
> -  The inability of various RDF-related formats etc to deal with 
>    numeric codes.

I don't recall seeing this issue.  Is this a reference to the issue 
described in Section 4.3 in the NewsML Architecture document, e.g., the 
example from the CURIE document of wanting to use something like 
iptc:10112244 (your description here doesn't make that entirely clear)? 
  If it is, certainly CURIEs would address this issue, as you note 
below.  However, wouldn't using XML entities also work to help 
abbreviate the URIs?  (This is illustrated in Example 8 in the RDF 
Primer for use in abbreviating typed literals, and the OWL specs use XML 
entities for other abbreviations as well).  Here, it's necessary to very 
precisely read your reference to "RDF-related formats", since the 
problem isn't with the RDF model per se, but rather the use of various 
notations (RDF/XML in this case) for encoding it.  Alternatively, have 
you looked at PRISM's approach to dealing with controlled vocabularies?

> 
> -  The problem of how to reconcile having 20-30 taxonomies in a 
>    document with keeping the document reasonably small.  We have 
>    asked about alternative mechanisms for declaring alias/URI 
>    correspondence, but all we have got back is: Use XML Namespaces.
>    This is despite the fact that we are not declaring namespaces 
>    for elements/attributes etc, and so do not need to be bound by 
>    the contraints specified for those.

Again, I don't recall seeing this issue, and I'm not exactly sure what 
you have in mind here.  Can you give an example or point to something in 
the NewsML specs that illustrates it?  It sounds like entities might be 
a possible approach here as well, though, although you might have 
reasons for not wanting to use them (and, as I said, I'm not necessarily 
sure of what you have in mind).

> 
> The RDF-in-XHTML task force is well on the path to specifying 
> CURIEs, which will address the first of these two concerns, but not 
> the second.  Consequently, we are having to invent our own 
> declaration mechanism, which is regrettable.

I can certainly see how this seems regrettable from your point of view, 
but after all *someone* has to invent the declaration mechanism don't 
they?  And hopefully they would do so on the basis of detailed 
application requirements, such as those you are bringing to the table, 
rather than "out of thin air".  I hope and believe that this sort of 
feedback will help improve the whole Semantic Web infrastructure, so 
from that point of view this isn't at all regrettable, however 
unfortunate it may seem to you now.

--Frank
Received on Wednesday, 22 February 2006 15:49:39 UTC