RE: ebXML Registry UC (Was Re: Agenda: RDF Data Access 27 Jul 2004) from Jim Hendler on 2004-08-03 (public-rdf-dawg@w3.org from July to September 2004)

From: Jim Hendler <hendler@cs.umd.edu>
Date: Tue, 3 Aug 2004 17:11:16 -0400
To: "Jeff Pollock" <Jeff.Pollock@networkinference.com>, "Farrukh Najmi" <Farrukh.Najmi@Sun.COM>, "Rob Shearer" <Rob.Shearer@networkinference.com>
Cc: "RDF Data Access Working Group" <public-rdf-dawg@w3.org>
Message-Id: <p0611048ebd35a82a7b56@[10.0.0.11]>
Jeff - as tempted as I am to send a long reply (especially to you 
rsecond sentence below which is simply falacious - there are many FOL 
subsets that can produce gaurantees - DL is arguably the maximal 
such) - let me be clear why I care about this WITH RESPECT TO THE 
WORK OF THE DAWG.

Consider the following example - I'd like to know whether the 
National Cancer Insitute's Cancer Ontology (available in OWL, see 
[1]) states that the FGFR3 Gene is one that promotes Fibroblast 
growth.   That is, I'm looking to see if the triple
  nci:FGFR3_Gene owl:subClassOf 
nci:Fibroblast_Growth_Factor_Receptor_Family_Gene
(where ..._Gene is a class) is in the ontology.
   One way I can do this is to do an HTTP_Get of nci:FGFR3_Gene and 
then look at the definition there (and hope the tool used put these 
triples in a standard class definition).  Another thing I can do is 
to get that document, serialize it into a tool (such as the ones your 
company creates) and use some sort of deduction to test to see if the 
above is entailed - that seems preferable.   However, there is a 
problem -- the NCI ontology is a document that is about 25M and 
contains about 300,000 triples or so -- so the download and 
serialization takes a long time.
  One thing we are exploring in our research is to serialize the 
ontology into a triple store (Tucana will be happy to hear we're 
using Kowari) and make it available on our web server.  Queries, 
coming in the eventual DAWG language using the eventual DAWG 
protocol, could provide the capability to answer many questions about 
this ontology (for example the direct subclass relation needed for 
the above query) in extremely fast times.  So we are exploring an 
alternate mechanism that looks like it will be very useful in 
practice and is of great interest to at least one major OWL supporter 
(the NCI).
  Now, I'm not arguing I would never use or prefer a reasoner, or that 
it wouldn't be possible to build persistent stores that allowed 
Cerebra or other such product to be used to answer these questions -- 
but it is my contention that many simple queries (and if the one 
above is too complex, how about if I want to simply know the directly 
asserted synonym list - an annotation property so no inference needed 
or allowed - for FGFR3_Gene) could be done using DAWG queries, and 
this would be of value in certain applications (but not all, and high 
end things would unquestionably need a more complex inferencer like 
yours)
  So my problem is that I don't want us to preclude a valuable use of 
RDF query because it is not the way some companies would prefer us to 
interact with OWL ontologies.   I though that my use case (2.11, 
which eventually only got in in a very watered down form due to Rob's 
objections), Farrukh's suggestion, and the continuing argument over 
4.6 vs. 4.6a all relate to this issue, so I was re-raising it in this 
context  to remind people that there are real users and use cases for 
exploring the use of RDF queries to access RDF graphs representing 
OWL ontologies.
  -JH

[1] http://www.mindswap.org/2003/CancerOntology/

w

At 12:39 -0700 8/3/04, Jeff Pollock wrote:
>Jim-
>
>I'm getting tired of reminding RDF people about why DL's are such an
>important part of the tech stack.  ;-)  Without it, there is no
>standardized or reliable inference capability that can guarantee same
>answers across different reasoner implementations. UDDI 3.0, OWL-S and
>many Bio/Pharma ontologies among others have chosen the DL based
>approach for good reasons. No one will argue that a DL view of things
>requires a conceptual shift, or that there are indeed technical
>limitations with what may be modeled. But in many cases the advantages
>outweigh the disadvantages.
>
>Regarding the RegRep, the SCM team has not yet debated the different
>levels of OWL support.  I, for one, think that DL is a reasonable
>alternative to seriously consider. Depending on the level of RegRep
>specification, it may be a needed requirement.  For example, if the
>RegRep simply exposes an OWL model as the interface to the repository -
>leaving it to vendors to implement their own query support - then
>restricting the interface to DL would enable an assured consistency in
>"inference at query" results across vendor implementations. Otherwise,
>different proprietary chaining algorithms could conceivably turn up
>different results from different vendors - causing chaos in a
>distributed DNS-like architecture.
>
>I know you saw my prezo at the '04 W3C AC Rep mtg, but here's a reminder
>of what I was saying regarding why DL's matter:
>
>* Consistency - query results, across vendor implementations and
>instances, should be consistent
>* Performance - Although performance metrics depend on model constructs,
>OWL-DL supports highly optimized inference algorithms
>* Predictable - semantics are mathematically decidable within the model,
>reasoning is finite
>* Foundational - provides a baseline inside applications for layered
>semantic models
>* Reliability - if the answer to a query is implied by any of the model
>data, it will be found - guaranteed.
>
>Lest people be fearful of DL's, which could happen if your points are
>taken out of context, I simply wanted to say that are indeed good
>reasons why they exist.
>
>Also, for the benefit of stating what should be obvious - Network
>Inference embraces and supports ALL of the semantic web stack - RDF,
>OWL-Lite, OWL-Full, and OWL-DL. Like you, we think that there are
>appropriate times to leverage all aspects of the spec.
>
>Time for me to get off the soapbox!
>
>Best Regards,
>
>-Jeff-
>
>
>-----Original Message-----
>From: public-rdf-dawg-request@w3.org
>[mailto:public-rdf-dawg-request@w3.org] On Behalf Of Jim Hendler
>Sent: Tuesday, August 03, 2004 9:56 AM
>To: Farrukh Najmi; Rob Shearer
>Cc: RDF Data Access Working Group
>Subject: Re: ebXML Registry UC (Was Re: Agenda: RDF Data Access 27 Jul
>2004)
>
>
>Farrukh, thanks for your response to Rob - I've gotten tired of
>reminding him and others that the DL methodology is only one of the
>ways OWL can be used (and in practice, it's not even the most common
>- most OWL out there falls in Full, not DL) - it also has the problem
>it is not yet scaleable to some of the largest Lite/DL ontologies out
>there, and these are precisely the ones I want to access via query
>instead of "document" (since the documents can get huge and take a
>long hours to download, parse  and classify).  Tools that will admit
>to the reality of the world out there and help people process it will
>be quite welcome
>   -JH
>p.s. Note, this is nothing against using DL when appropriate, as in
>many of NI's business uses, but just to make it clear that DL is only
>one of many ways OWL is being used, and it CANNOT be the defining
>restriction for all use cases and applicability ... oops, I'm
>starting to get passionate and use uppercase - I'll stop now...
>
>
>At 10:26 -0400 8/3/04, Farrukh Najmi wrote:
>>Rob Shearer wrote:
>>
>>>Greetings, Farrukh!
>>>
>>>Apologies for not initiating contact myself.
>>>
>>>Your use case came up at the face-to-face, and I was curious whether
>>>there were alternative ways to achieve the results you were trying to
>>>get.
>>>
>>>You suggest a method of "query refinement" to select the elements of
>an
>>>ontology in which you're interested: first do a general query, then
>add
>>>a few more qualifying predicates, then add a few more, each time
>taking
>>>a look at the result set and figuring out what to add to prune out the
>>>results in which you're not interested. (Please correct the most
>>>offensive bits of this crude summary.)
>>>
>>>In traditional description logics systems, the process of "concept
>>>refinement" is most commonly implemented by traversing a concept
>>>taxonomy using not just "subclass"-style edges, but rather
>>>"direct-subclass" relationships. For example, a taxonomy of "Worker",
>>>"White-Collar Worker", and "Accountant" would include both
>"White-Collar
>>>Worker" and "Accountant" as subclasses of "Worker", however only
>>>"White-Collar Worker" would be a direct subclass.
>>>
>>>The common use pattern would be a user interested in "Worker", so the
>>>user asks for the direct subs of worker and finds that they are "White
>>>Collar", "Blue Collar", "Service", and "Military". He can then drill
>>>down on whichever of these he wishes, each time getting a fairly small
>>>and easily-consumed result set. This is usually much easier to manage
>>>than trying to figure out how to refine hundreds, thousands, or
>millions
>>>of results by hand somehow.
>>>
>>>Is any approach along these lines applicable to your use case?
>>I totally agree with sub-class refinement as the most common narrowing
>>technique.
>>
>>The use case envision the query to have zero or more parameters. Any
>one
>>of the parameters
>>MAY be a Concept in a taxonomy (or a class in an Ontology).
>>
>>This is implied but not stated in the use case as I was trying to have
>a
>>minimalistic
>>description that was easy to follow and conveyed the core use case.
>>
>>If you would like to propose a modified version to the use case text
>>send me a draft and
>>we can try and reach closure on the issue before the next DAWG meeting
>if
>>possible.
>>
>>
>>--
>>Regards,
>>Farrukh
>
>--
>Professor James Hendler
>http://www.cs.umd.edu/users/hendler
>Director, Semantic Web and Agent Technologies	  301-405-2696
>Maryland Information and Network Dynamics Lab.	  301-405-6707 (Fax)
>Univ of Maryland, College Park, MD 20742

-- 
Professor James Hendler			  http://www.cs.umd.edu/users/hendler 
Director, Semantic Web and Agent Technologies	  301-405-2696
Maryland Information and Network Dynamics Lab.	  301-405-6707 (Fax)
Univ of Maryland, College Park, MD 20742
Received on Tuesday, 3 August 2004 17:11:59 UTC