RE: ebXML Registry UC (Was Re: Agenda: RDF Data Access 27 Jul 2004) from Rob Shearer on 2004-08-03 (public-rdf-dawg@w3.org from July to September 2004)

From: Rob Shearer <Rob.Shearer@networkinference.com>
Date: Tue, 3 Aug 2004 14:58:32 -0700
To: "Jim Hendler" <hendler@cs.umd.edu>, "Jeff Pollock" <Jeff.Pollock@networkinference.com>, "Farrukh Najmi" <Farrukh.Najmi@Sun.COM>
Cc: "RDF Data Access Working Group" <public-rdf-dawg@w3.org>
Message-ID: <CFE388CECDDB1E43AB1F60136BEB4973028119@rome.ad.networkinference.com>
Frankly, I'm not sure this is a very productive discussion, but I feel
obliged to weigh in considering the frequency with which my name has
been raised here.

I certainly never intended to imply that description logics, or even
OWL, should be the only "methodology" with which RDF applications should
be viewed. I absolutely see a great deal of value in the use of "pure"
RDF, with no additional inferencing layer.

The point I have tried to raise, in my discussions with Jim in
particular, is that it is very dangerous to take an ad-hoc approach to
semantics and reasoning. It is perfectly valid to consider looking for
"domain" or "range" triples in an RDF file (which happens to contain OWL
assertions). It even makes some sense to ask whether such a triple is
entailed by an OWL ontology (although all our experiences at Network
Inference suggests that simple entailment is an incredibly inconvenient
query interface). It is definitely weird, however, to ask the general
question "what is the domain of this property" and have no clear
semantics for what constitutes a correct answer. As I pointed out to Jim
on a recent teleconference, it is quite straightforward for a property
to have a well-defined domain or range without any domain or range
triples appearing in the OWL/RDF file.

My general concern is that any query language we come up with should
have formal model from which one can deduce the "correct" response. If
you consider this kind of mathematical rigour to be "the DL
methodology", then so be it, but it clearly is not specific to
description logics. If the results to queries are determined by ad-hoc
implementations and unspecified semantics, then I don't think we are any
closer to interoperable exchange of semantic data than we ever were.

> -----Original Message-----
> From: Jim Hendler [mailto:hendler@cs.umd.edu] 
> Sent: Tuesday, August 03, 2004 2:11 PM
> To: Jeff Pollock; Farrukh Najmi; Rob Shearer
> Cc: RDF Data Access Working Group
> Subject: RE: ebXML Registry UC (Was Re: Agenda: RDF Data 
> Access 27 Jul 2004)
> 
> Jeff - as tempted as I am to send a long reply (especially to 
> you rsecond sentence below which is simply falacious - there 
> are many FOL subsets that can produce gaurantees - DL is 
> arguably the maximal such) - let me be clear why I care about 
> this WITH RESPECT TO THE WORK OF THE DAWG.
> 
> Consider the following example - I'd like to know whether the 
> National Cancer Insitute's Cancer Ontology (available in OWL, 
> see [1]) states that the FGFR3 Gene is one that promotes 
> Fibroblast growth.   That is, I'm looking to see if the triple
>  nci:FGFR3_Gene owl:subClassOf 
> nci:Fibroblast_Growth_Factor_Receptor_Family_Gene
> (where ..._Gene is a class) is in the ontology.   
>   One way I can do this is to do an HTTP_Get of 
> nci:FGFR3_Gene and then look at the definition there (and 
> hope the tool used put these triples in a standard class 
> definition).  Another thing I can do is to get that document, 
> serialize it into a tool (such as the ones your company 
> creates) and use some sort of deduction to test to see if the 
> above is entailed - that seems preferable.   However, there 
> is a problem -- the NCI ontology is a document that is about 
> 25M and contains about 300,000 triples or so -- so the 
> download and serialization takes a long time.
>  One thing we are exploring in our research is to serialize 
> the ontology into a triple store (Tucana will be happy to 
> hear we're using Kowari) and make it available on our web 
> server.  Queries, coming in the eventual DAWG language using 
> the eventual DAWG protocol, could provide the capability to 
> answer many questions about this ontology (for example the 
> direct subclass relation needed for the above query) in 
> extremely fast times.  So we are exploring an alternate 
> mechanism that looks like it will be very useful in practice 
> and is of great interest to at least one major OWL supporter 
> (the NCI). 
>  Now, I'm not arguing I would never use or prefer a reasoner, 
> or that it wouldn't be possible to build persistent stores 
> that allowed Cerebra or other such product to be used to 
> answer these questions -- but it is my contention that many 
> simple queries (and if the one above is too complex, how 
> about if I want to simply know the directly asserted synonym 
> list - an annotation property so no inference needed or 
> allowed - for FGFR3_Gene) could be done using DAWG queries, 
> and this would be of value in certain applications (but not 
> all, and high end things would unquestionably need a more 
> complex inferencer like yours)
>  So my problem is that I don't want us to preclude a valuable 
> use of RDF query because it is not the way some companies 
> would prefer us to interact with OWL ontologies.   I though 
> that my use case (2.11, which eventually only got in in a 
> very watered down form due to Rob's objections), Farrukh's 
> suggestion, and the continuing argument over 4.6 vs. 4.6a all 
> relate to this issue, so I was re-raising it in this context  
> to remind people that there are real users and use cases for 
> exploring the use of RDF queries to access RDF graphs 
> representing OWL ontologies.
>  -JH
> 
> [1] http://www.mindswap.org/2003/CancerOntology/
> 
> w
> 
> At 12:39 -0700 8/3/04, Jeff Pollock wrote:
> >Jim-
> >
> >I'm getting tired of reminding RDF people about why DL's are such an
> >important part of the tech stack.  ;-)  Without it, there is no
> >standardized or reliable inference capability that can guarantee same
> >answers across different reasoner implementations. UDDI 3.0, 
> OWL-S and
> >many Bio/Pharma ontologies among others have chosen the DL based
> >approach for good reasons. No one will argue that a DL view of things
> >requires a conceptual shift, or that there are indeed technical
> >limitations with what may be modeled. But in many cases the 
> advantages
> >outweigh the disadvantages.
> >
> >Regarding the RegRep, the SCM team has not yet debated the different
> >levels of OWL support.  I, for one, think that DL is a reasonable
> >alternative to seriously consider. Depending on the level of RegRep
> >specification, it may be a needed requirement.  For example, if the
> >RegRep simply exposes an OWL model as the interface to the 
> repository -
> >leaving it to vendors to implement their own query support - then
> >restricting the interface to DL would enable an assured 
> consistency in
> >"inference at query" results across vendor implementations. 
> Otherwise,
> >different proprietary chaining algorithms could conceivably turn up
> >different results from different vendors - causing chaos in a
> >distributed DNS-like architecture.
> >
> >I know you saw my prezo at the '04 W3C AC Rep mtg, but 
> here's a reminder
> >of what I was saying regarding why DL's matter:
> >
> >* Consistency - query results, across vendor implementations and
> >instances, should be consistent
> >* Performance - Although performance metrics depend on model 
> constructs,
> >OWL-DL supports highly optimized inference algorithms
> >* Predictable - semantics are mathematically decidable 
> within the model,
> >reasoning is finite
> >* Foundational - provides a baseline inside applications for layered
> >semantic models
> >* Reliability - if the answer to a query is implied by any 
> of the model
> >data, it will be found - guaranteed.
> >
> >Lest people be fearful of DL's, which could happen if your points are
> >taken out of context, I simply wanted to say that are indeed good
> >reasons why they exist.
> >
> >Also, for the benefit of stating what should be obvious - Network
> >Inference embraces and supports ALL of the semantic web stack - RDF,
> >OWL-Lite, OWL-Full, and OWL-DL. Like you, we think that there are
> >appropriate times to leverage all aspects of the spec.
> >
> >Time for me to get off the soapbox!
> >
> >Best Regards,
> >
> >-Jeff-
> >
> >
> >-----Original Message-----
> >From: public-rdf-dawg-request@w3.org
> >[mailto:public-rdf-dawg-request@w3.org] On Behalf Of Jim Hendler
> >Sent: Tuesday, August 03, 2004 9:56 AM
> >To: Farrukh Najmi; Rob Shearer
> >Cc: RDF Data Access Working Group
> >Subject: Re: ebXML Registry UC (Was Re: Agenda: RDF Data 
> Access 27 Jul
> >2004)
> >
> >
> >Farrukh, thanks for your response to Rob - I've gotten tired of
> >reminding him and others that the DL methodology is only one of the
> >ways OWL can be used (and in practice, it's not even the most common
> >- most OWL out there falls in Full, not DL) - it also has the problem
> >it is not yet scaleable to some of the largest Lite/DL ontologies out
> >there, and these are precisely the ones I want to access via query
> >instead of "document" (since the documents can get huge and take a
> >long hours to download, parse  and classify).  Tools that will admit
> >to the reality of the world out there and help people process it will
> >be quite welcome
> >  -JH
> >p.s. Note, this is nothing against using DL when appropriate, as in
> >many of NI's business uses, but just to make it clear that DL is only
> >one of many ways OWL is being used, and it CANNOT be the defining
> >restriction for all use cases and applicability ... oops, I'm
> >starting to get passionate and use uppercase - I'll stop now...
> >
> >
> >At 10:26 -0400 8/3/04, Farrukh Najmi wrote:
> >>Rob Shearer wrote:
> >>
> >>>Greetings, Farrukh!
> >>>
> >>>Apologies for not initiating contact myself.
> >>>
> >>>Your use case came up at the face-to-face, and I was 
> curious whether
> >>>there were alternative ways to achieve the results you 
> were trying to
> >>>get.
> >>>
> >>>You suggest a method of "query refinement" to select the 
> elements of
> >an
> >>>ontology in which you're interested: first do a general query, then
> >add
> >>>a few more qualifying predicates, then add a few more, each time
> >taking
> >>>a look at the result set and figuring out what to add to 
> prune out the
> >>>results in which you're not interested. (Please correct the most
> >>>offensive bits of this crude summary.)
> >>>
> >>>In traditional description logics systems, the process of "concept
> >>>refinement" is most commonly implemented by traversing a concept
> >>>taxonomy using not just "subclass"-style edges, but rather
> >>>"direct-subclass" relationships. For example, a taxonomy 
> of "Worker",
> >>>"White-Collar Worker", and "Accountant" would include both
> >"White-Collar
> >>>Worker" and "Accountant" as subclasses of "Worker", however only
> >>>"White-Collar Worker" would be a direct subclass.
> >>>
> >>>The common use pattern would be a user interested in 
> "Worker", so the
> >>>user asks for the direct subs of worker and finds that 
> they are "White
> >>>Collar", "Blue Collar", "Service", and "Military". He can 
> then drill
> >>>down on whichever of these he wishes, each time getting a 
> fairly small
> >>>and easily-consumed result set. This is usually much 
> easier to manage
> >>>than trying to figure out how to refine hundreds, thousands, or
> >millions
> >>>of results by hand somehow.
> >>>
> >>>Is any approach along these lines applicable to your use case?
> >>I totally agree with sub-class refinement as the most 
> common narrowing
> >>technique.
> >>
> >>The use case envision the query to have zero or more parameters. Any
> >one
> >>of the parameters
> >>MAY be a Concept in a taxonomy (or a class in an Ontology).
> >>
> >>This is implied but not stated in the use case as I was 
> trying to have
> >a
> >>minimalistic
> >>description that was easy to follow and conveyed the core use case.
> >>
> >>If you would like to propose a modified version to the use case text
> >>send me a draft and
> >>we can try and reach closure on the issue before the next 
> DAWG meeting
> >if
> >>possible.
> >>
> >>
> >>--
> >>Regards,
> >>Farrukh
> >
> >--
> >Professor James Hendler
> >http://www.cs.umd.edu/users/hendler
> >Director, Semantic Web and Agent Technologies    301-405-2696
> >Maryland Information and Network Dynamics Lab.     301-405-6707 (Fax)
> >Univ of Maryland, College Park, MD 20742
> 
> -- 
> Professor James Hendler                   
> http://www.cs.umd.edu/users/hendler 
> Director, Semantic Web and Agent Technologies       301-405-2696
> Maryland Information and Network Dynamics Lab.      301-405-6707 (Fax)
> Univ of Maryland, College Park, MD 20742    
>    
>
Received on Tuesday, 3 August 2004 18:07:16 UTC