RE: ebXML Registry UC (Was Re: Agenda: RDF Data Access 27 Jul 2004) from Jeff Pollock on 2004-08-03 (public-rdf-dawg@w3.org from July to September 2004)

From: Jeff Pollock <Jeff.Pollock@networkinference.com>
Date: Tue, 3 Aug 2004 16:40:56 -0700
To: "Jim Hendler" <hendler@cs.umd.edu>, "Farrukh Najmi" <Farrukh.Najmi@Sun.COM>, "Rob Shearer" <Rob.Shearer@networkinference.com>
Cc: "RDF Data Access Working Group" <public-rdf-dawg@w3.org>
Message-ID: <CFE388CECDDB1E43AB1F60136BEB49732CECB9@rome.ad.networkinference.com>
Jim-

 

Point taken, and fully understood.  With respect to the DAWG there are many simpler queries that require minimal entailment. As Rob mentioned in his note, we IN NO WAY seek to inject an OWL centric view of this group's output - we value the role of RDF as a standalone capability. As most members of DAWG would point out, there are more RDF-only implementations than OWL-only implementations in the marketplace today.

 

Our position, as Rob succinctly stated, is that ad-hoc implementations with vendor-specific notions of "semantics" (vis-à-vis the algorithms used to determine correctness) have practical, and architectural limitations in a fully networked ("webbed") context. In other words: RDF is not the be-all-and-end-all of the semantic web. (but with you, I know I am preaching to the choir...)

 

In fact, the whole reason Network Inference is pushing for an XQuery surface layer for this query specification is to ensure that - in the future - the W3C can continue this group's progress to effect a query approach for OWL.  Frankly, and this is no surprise to anyone, the scope of the DAWG's current work is focused so intently on RDF as to be only minimally relevant to querying OWL via inference. And Rules is a whole 'nother question....

 

The context in which this thread started was with the RegRep; which is in fact leaning towards an OWL-based specification for mapping the RIM into. In this context, I think that Rob's clarification for Farrukh was on target.  Perhaps the implications of a revised use case will show that certain query features are indeed out of scope for the DAWG at this time. In which case, these ideas can be tabled for later - along with other OWL-focused query needs.

 

Best Regards,

 

-Jeff-

 

BTW: I stand behind my statement that "Without it [DL], there is no standardized or reliable inference capability that can guarantee same answers across different reasoner implementations." - we should discuss further offline to make sure that we're not having interpretation problems...

 

 

 

 

 

 

 

 

 

 

  _____  

From: Jim Hendler [mailto:hendler@cs.umd.edu] 
Sent: Tuesday, August 03, 2004 2:11 PM
To: Jeff Pollock; Farrukh Najmi; Rob Shearer
Cc: RDF Data Access Working Group
Subject: RE: ebXML Registry UC (Was Re: Agenda: RDF Data Access 27 Jul 2004)

 

Jeff - as tempted as I am to send a long reply (especially to you rsecond sentence below which is simply falacious - there are many FOL subsets that can produce gaurantees - DL is arguably the maximal such) - let me be clear why I care about this WITH RESPECT TO THE WORK OF THE DAWG.

 

Consider the following example - I'd like to know whether the National Cancer Insitute's Cancer Ontology (available in OWL, see [1]) states that the FGFR3 Gene is one that promotes Fibroblast growth.   That is, I'm looking to see if the triple

 nci:FGFR3_Gene owl:subClassOf nci:Fibroblast_Growth_Factor_Receptor_Family_Gene

(where ..._Gene is a class) is in the ontology.   

  One way I can do this is to do an HTTP_Get of nci:FGFR3_Gene and then look at the definition there (and hope the tool used put these triples in a standard class definition).  Another thing I can do is to get that document, serialize it into a tool (such as the ones your company creates) and use some sort of deduction to test to see if the above is entailed - that seems preferable.   However, there is a problem -- the NCI ontology is a document that is about 25M and contains about 300,000 triples or so -- so the download and serialization takes a long time.

 One thing we are exploring in our research is to serialize the ontology into a triple store (Tucana will be happy to hear we're using Kowari) and make it available on our web server.  Queries, coming in the eventual DAWG language using the eventual DAWG protocol, could provide the capability to answer many questions about this ontology (for example the direct subclass relation needed for the above query) in extremely fast times.  So we are exploring an alternate mechanism that looks like it will be very useful in practice and is of great interest to at least one major OWL supporter (the NCI). 

 Now, I'm not arguing I would never use or prefer a reasoner, or that it wouldn't be possible to build persistent stores that allowed Cerebra or other such product to be used to answer these questions -- but it is my contention that many simple queries (and if the one above is too complex, how about if I want to simply know the directly asserted synonym list - an annotation property so no inference needed or allowed - for FGFR3_Gene) could be done using DAWG queries, and this would be of value in certain applications (but not all, and high end things would unquestionably need a more complex inferencer like yours)

 So my problem is that I don't want us to preclude a valuable use of RDF query because it is not the way some companies would prefer us to interact with OWL ontologies.   I though that my use case (2.11, which eventually only got in in a very watered down form due to Rob's objections), Farrukh's suggestion, and the continuing argument over 4.6 vs. 4.6a all relate to this issue, so I was re-raising it in this context  to remind people that there are real users and use cases for exploring the use of RDF queries to access RDF graphs representing OWL ontologies.

 -JH

 

[1] http://www.mindswap.org/2003/CancerOntology/

 

w

 

At 12:39 -0700 8/3/04, Jeff Pollock wrote:

>Jim-
>
>I'm getting tired of reminding RDF people about why DL's are such an
>important part of the tech stack.  ;-)  Without it, there is no
>standardized or reliable inference capability that can guarantee same
>answers across different reasoner implementations. UDDI 3.0, OWL-S and
>many Bio/Pharma ontologies among others have chosen the DL based
>approach for good reasons. No one will argue that a DL view of things
>requires a conceptual shift, or that there are indeed technical
>limitations with what may be modeled. But in many cases the advantages
>outweigh the disadvantages.
>
>Regarding the RegRep, the SCM team has not yet debated the different

>levels of OWL support.  I, for one, think that DL is a reasonable
>alternative to seriously consider. Depending on the level of RegRep
>specification, it may be a needed requirement.  For example, if the
>RegRep simply exposes an OWL model as the interface to the repository -
>leaving it to vendors to implement their own query support - then
>restricting the interface to DL would enable an assured consistency in
>"inference at query" results across vendor implementations. Otherwise,
>different proprietary chaining algorithms could conceivably turn up
>different results from different vendors - causing chaos in a
>distributed DNS-like architecture.
>
>I know you saw my prezo at the '04 W3C AC Rep mtg, but here's a reminder
>of what I was saying regarding why DL's matter:
>
>* Consistency - query results, across vendor implementations and
>instances, should be consistent
>* Performance - Although performance metrics depend on model constructs,
>OWL-DL supports highly optimized inference algorithms
>* Predictable - semantics are mathematically decidable within the model,
>reasoning is finite
>* Foundational - provides a baseline inside applications for layered
>semantic models
>* Reliability - if the answer to a query is implied by any of the model
>data, it will be found - guaranteed.
>
>Lest people be fearful of DL's, which could happen if your points are
>taken out of context, I simply wanted to say that are indeed good
>reasons why they exist.
>
>Also, for the benefit of stating what should be obvious - Network
>Inference embraces and supports ALL of the semantic web stack - RDF,
>OWL-Lite, OWL-Full, and OWL-DL. Like you, we think that there are
>appropriate times to leverage all aspects of the spec.
>
>Time for me to get off the soapbox!
>
>Best Regards,
>
>-Jeff-
>
>
>-----Original Message-----
>From: public-rdf-dawg-request@w3.org
>[mailto:public-rdf-dawg-request@w3.org] On Behalf Of Jim Hendler
>Sent: Tuesday, August 03, 2004 9:56 AM
>To: Farrukh Najmi; Rob Shearer
>Cc: RDF Data Access Working Group
>Subject: Re: ebXML Registry UC (Was Re: Agenda: RDF Data Access 27 Jul
>2004)
>
>
>Farrukh, thanks for your response to Rob - I've gotten tired of
>reminding him and others that the DL methodology is only one of the
>ways OWL can be used (and in practice, it's not even the most common
>- most OWL out there falls in Full, not DL) - it also has the problem
>it is not yet scaleable to some of the largest Lite/DL ontologies out
>there, and these are precisely the ones I want to access via query
>instead of "document" (since the documents can get huge and take a
>long hours to download, parse  and classify).  Tools that will admit
>to the reality of the world out there and help people process it will
>be quite welcome
>  -JH
>p.s. Note, this is nothing against using DL when appropriate, as in
>many of NI's business uses, but just to make it clear that DL is only
>one of many ways OWL is being used, and it CANNOT be the defining
>restriction for all use cases and applicability ... oops, I'm
>starting to get passionate and use uppercase - I'll stop now...
>
>
>At 10:26 -0400 8/3/04, Farrukh Najmi wrote:
>>Rob Shearer wrote:
>>
>>>Greetings, Farrukh!
>>>
>>>Apologies for not initiating contact myself.
>>>
>>>Your use case came up at the face-to-face, and I was curious whether
>>>there were alternative ways to achieve the results you were trying to
>>>get.
>>>
>>>You suggest a method of "query refinement" to select the elements of
>an
>>>ontology in which you're interested: first do a general query, then
>add
>>>a few more qualifying predicates, then add a few more, each time
>taking
>>>a look at the result set and figuring out what to add to prune out the
>>>results in which you're not interested. (Please correct the most
>>>offensive bits of this crude summary.)
>>>
>>>In traditional description logics systems, the process of "concept
>>>refinement" is most commonly implemented by traversing a concept
>>>taxonomy using not just "subclass"-style edges, but rather
>>>"direct-subclass" relationships. For example, a taxonomy of "Worker",

>>>"White-Collar Worker", and "Accountant" would include both
>"White-Collar
>>>Worker" and "Accountant" as subclasses of "Worker", however only
>>>"White-Collar Worker" would be a direct subclass.
>>>
>>>The common use pattern would be a user interested in "Worker", so the
>>>user asks for the direct subs of worker and finds that they are "White
>>>Collar", "Blue Collar", "Service", and "Military". He can then drill
>>>down on whichever of these he wishes, each time getting a fairly small
>>>and easily-consumed result set. This is usually much easier to manage
>>>than trying to figure out how to refine hundreds, thousands, or
>millions
>>>of results by hand somehow.
>>>
>>>Is any approach along these lines applicable to your use case?
>>I totally agree with sub-class refinement as the most common narrowing
>>technique.
>>
>>The use case envision the query to have zero or more parameters. Any
>one
>>of the parameters
>>MAY be a Concept in a taxonomy (or a class in an Ontology).
>>
>>This is implied but not stated in the use case as I was trying to have
>a
>>minimalistic
>>description that was easy to follow and conveyed the core use case.
>>
>>If you would like to propose a modified version to the use case text
>>send me a draft and
>>we can try and reach closure on the issue before the next DAWG meeting
>if
>>possible.
>>
>>
>>--
>>Regards,
>>Farrukh
>
>--
>Professor James Hendler
>http://www.cs.umd.edu/users/hendler
>Director, Semantic Web and Agent Technologies    301-405-2696
>Maryland Information and Network Dynamics Lab.     301-405-6707 (Fax)
>Univ of Maryland, College Park, MD 20742

 

-- 

Professor James Hendler                   http://www.cs.umd.edu/users/hendler 
Director, Semantic Web and Agent Technologies       301-405-2696
Maryland Information and Network Dynamics Lab.      301-405-6707 (Fax)
Univ of Maryland, College Park, MD 20742
Received on Tuesday, 3 August 2004 19:43:36 UTC