RE: ebXML Registry UC (Was Re: Agenda: RDF Data Access 27 Jul 2004) from Jos De_Roo on 2004-08-03 (public-rdf-dawg@w3.org from July to September 2004)

From: Jos De_Roo <jos.deroo@agfa.com>
Date: Wed, 4 Aug 2004 01:06:35 +0200
To: "Jim Hendler <hendler" <hendler@cs.umd.edu>
Cc: "Farrukh Najmi" <Farrukh.Najmi@Sun.COM>, "Jeff Pollock" <Jeff.Pollock@networkinference.com>, "RDF Data Access Working Group" <public-rdf-dawg@w3.org>, public-rdf-dawg-request@w3.org, "Rob Shearer" <Rob.Shearer@networkinference.com>
Message-ID: <OF0822F8D1.A421DC02-ONC1256EE5.007DE5B3-C1256EE5.007EEC5C@agfa.com>
Jim - I fully agree with that NCI test case and was also
able to get the answer

[[
nci:FGFR3_Gene rdfs:subClassOf 
nci:Fibroblast_Growth_Factor_Receptor_Family_Gene. 

# Proof found for file://temp/testC.n3 in 15503 steps (41838 steps/sec) 
using 1 engine
]]

The reasoning was done pretty quickly but there was
an overhead of a few minutes to load and prepare and
some 800 MB of RAM were needed...


-- 
Jos De Roo, AGFA http://www.agfa.com/w3c/jdroo/




Jim Hendler <hendler@cs.umd.edu>
Sent by: public-rdf-dawg-request@w3.org
04/08/2004 00:28

 
        To:     "Rob Shearer" <Rob.Shearer@networkinference.com>, "Jeff Pollock" 
<Jeff.Pollock@networkinference.com>, "Farrukh Najmi" 
<Farrukh.Najmi@Sun.COM>
        cc:     "RDF Data Access Working Group" <public-rdf-dawg@w3.org>
        Subject:        RE: ebXML Registry UC (Was Re: Agenda: RDF Data Access 27 Jul   2004)



I think I answered this one before you sent it -- see my earlier
posting of a use case in which all the necessary information woudl be
in the triple store, where the queries are semantically well-defined
(but little or no entailment reasoning is required) and where no
existant mechanism I can find suffices for an immediate need by a
major OWL user.  I'm not being academic and pedantic here - I'm
trying to satisfy a real need by my organization's backers - which is
why I participate in this WG, same as you.   I'm not badmouthing your
approach, just saying there is other necessary work that I would like
the DAWG to consider as it is needed for what I do.
-JH


At 14:58 -0700 8/3/04, Rob Shearer wrote:
>Frankly, I'm not sure this is a very productive discussion, but I feel
>obliged to weigh in considering the frequency with which my name has
>been raised here.
>
>I certainly never intended to imply that description logics, or even
>OWL, should be the only "methodology" with which RDF applications should
>be viewed. I absolutely see a great deal of value in the use of "pure"
>RDF, with no additional inferencing layer.
>
>The point I have tried to raise, in my discussions with Jim in
>particular, is that it is very dangerous to take an ad-hoc approach to
>semantics and reasoning. It is perfectly valid to consider looking for
>"domain" or "range" triples in an RDF file (which happens to contain OWL
>assertions). It even makes some sense to ask whether such a triple is
>entailed by an OWL ontology (although all our experiences at Network
>Inference suggests that simple entailment is an incredibly inconvenient
>query interface). It is definitely weird, however, to ask the general
>question "what is the domain of this property" and have no clear
>semantics for what constitutes a correct answer. As I pointed out to Jim
>on a recent teleconference, it is quite straightforward for a property
>to have a well-defined domain or range without any domain or range
>triples appearing in the OWL/RDF file.
>
>My general concern is that any query language we come up with should
>have formal model from which one can deduce the "correct" response. If
>you consider this kind of mathematical rigour to be "the DL
>methodology", then so be it, but it clearly is not specific to
>description logics. If the results to queries are determined by ad-hoc
>implementations and unspecified semantics, then I don't think we are any
>closer to interoperable exchange of semantic data than we ever were.
>
>>  -----Original Message-----
>>  From: Jim Hendler [mailto:hendler@cs.umd.edu]
>>  Sent: Tuesday, August 03, 2004 2:11 PM
>>  To: Jeff Pollock; Farrukh Najmi; Rob Shearer
>>  Cc: RDF Data Access Working Group
>>  Subject: RE: ebXML Registry UC (Was Re: Agenda: RDF Data
>>  Access 27 Jul 2004)
>>
>>  Jeff - as tempted as I am to send a long reply (especially to
>>  you rsecond sentence below which is simply falacious - there
>>  are many FOL subsets that can produce gaurantees - DL is
>>  arguably the maximal such) - let me be clear why I care about
>>  this WITH RESPECT TO THE WORK OF THE DAWG.
>>
>>  Consider the following example - I'd like to know whether the
>>  National Cancer Insitute's Cancer Ontology (available in OWL,
>>  see [1]) states that the FGFR3 Gene is one that promotes
>>  Fibroblast growth.   That is, I'm looking to see if the triple
>>   nci:FGFR3_Gene owl:subClassOf
>>  nci:Fibroblast_Growth_Factor_Receptor_Family_Gene
>>  (where ..._Gene is a class) is in the ontology.
>>    One way I can do this is to do an HTTP_Get of
>>  nci:FGFR3_Gene and then look at the definition there (and
>>  hope the tool used put these triples in a standard class
>>  definition).  Another thing I can do is to get that document,
>>  serialize it into a tool (such as the ones your company
>>  creates) and use some sort of deduction to test to see if the
>>  above is entailed - that seems preferable.   However, there
>>  is a problem -- the NCI ontology is a document that is about
>>  25M and contains about 300,000 triples or so -- so the
>>  download and serialization takes a long time.
>>   One thing we are exploring in our research is to serialize
>>  the ontology into a triple store (Tucana will be happy to
>>  hear we're using Kowari) and make it available on our web
>>  server.  Queries, coming in the eventual DAWG language using
>>  the eventual DAWG protocol, could provide the capability to
>>  answer many questions about this ontology (for example the
>>  direct subclass relation needed for the above query) in
>>  extremely fast times.  So we are exploring an alternate
>>  mechanism that looks like it will be very useful in practice
>>  and is of great interest to at least one major OWL supporter
>>  (the NCI).
>>   Now, I'm not arguing I would never use or prefer a reasoner,
>>  or that it wouldn't be possible to build persistent stores
>>  that allowed Cerebra or other such product to be used to
>>  answer these questions -- but it is my contention that many
>>  simple queries (and if the one above is too complex, how
>>  about if I want to simply know the directly asserted synonym
>>  list - an annotation property so no inference needed or
>>  allowed - for FGFR3_Gene) could be done using DAWG queries,
>>  and this would be of value in certain applications (but not
>>  all, and high end things would unquestionably need a more
>>  complex inferencer like yours)
>>   So my problem is that I don't want us to preclude a valuable
>>  use of RDF query because it is not the way some companies
>>  would prefer us to interact with OWL ontologies.   I though
>>  that my use case (2.11, which eventually only got in in a
>>  very watered down form due to Rob's objections), Farrukh's
>>  suggestion, and the continuing argument over 4.6 vs. 4.6a all
>>  relate to this issue, so I was re-raising it in this context
>>  to remind people that there are real users and use cases for
>>  exploring the use of RDF queries to access RDF graphs
>>  representing OWL ontologies.
>>   -JH
>>
>>  [1] http://www.mindswap.org/2003/CancerOntology/
>>
>>  w
>>
>>  At 12:39 -0700 8/3/04, Jeff Pollock wrote:
>>  >Jim-
>>  >
>>  >I'm getting tired of reminding RDF people about why DL's are such an
>>  >important part of the tech stack.  ;-)  Without it, there is no
>>  >standardized or reliable inference capability that can guarantee same
>>  >answers across different reasoner implementations. UDDI 3.0,
>>  OWL-S and
>>  >many Bio/Pharma ontologies among others have chosen the DL based
>>  >approach for good reasons. No one will argue that a DL view of things
>>  >requires a conceptual shift, or that there are indeed technical
>>  >limitations with what may be modeled. But in many cases the
>>  advantages
>>  >outweigh the disadvantages.
>>  >
>>  >Regarding the RegRep, the SCM team has not yet debated the different
>>  >levels of OWL support.  I, for one, think that DL is a reasonable
>>  >alternative to seriously consider. Depending on the level of RegRep
>>  >specification, it may be a needed requirement.  For example, if the
>>  >RegRep simply exposes an OWL model as the interface to the
>>  repository -
>>  >leaving it to vendors to implement their own query support - then
>>  >restricting the interface to DL would enable an assured
>>  consistency in
>>  >"inference at query" results across vendor implementations.
>>  Otherwise,
>>  >different proprietary chaining algorithms could conceivably turn up
>>  >different results from different vendors - causing chaos in a
>>  >distributed DNS-like architecture.
>>  >
>>  >I know you saw my prezo at the '04 W3C AC Rep mtg, but
>>  here's a reminder
>>  >of what I was saying regarding why DL's matter:
>>  >
>>  >* Consistency - query results, across vendor implementations and
>>  >instances, should be consistent
>>  >* Performance - Although performance metrics depend on model
>>  constructs,
>>  >OWL-DL supports highly optimized inference algorithms
>>  >* Predictable - semantics are mathematically decidable
>>  within the model,
>>  >reasoning is finite
>>  >* Foundational - provides a baseline inside applications for layered
>>  >semantic models
>>  >* Reliability - if the answer to a query is implied by any
>>  of the model
>>  >data, it will be found - guaranteed.
>>  >
>>  >Lest people be fearful of DL's, which could happen if your points are
>>  >taken out of context, I simply wanted to say that are indeed good
>>  >reasons why they exist.
>>  >
>>  >Also, for the benefit of stating what should be obvious - Network
>>  >Inference embraces and supports ALL of the semantic web stack - RDF,
>>  >OWL-Lite, OWL-Full, and OWL-DL. Like you, we think that there are
>>  >appropriate times to leverage all aspects of the spec.
>>  >
>>  >Time for me to get off the soapbox!
>>  >
>>  >Best Regards,
>>  >
>>  >-Jeff-
>>  >
>>  >
>>  >-----Original Message-----
>>  >From: public-rdf-dawg-request@w3.org
>>  >[mailto:public-rdf-dawg-request@w3.org] On Behalf Of Jim Hendler
>>  >Sent: Tuesday, August 03, 2004 9:56 AM
>>  >To: Farrukh Najmi; Rob Shearer
>>  >Cc: RDF Data Access Working Group
>>  >Subject: Re: ebXML Registry UC (Was Re: Agenda: RDF Data
>>  Access 27 Jul
>>  >2004)
>>  >
>>  >
>>  >Farrukh, thanks for your response to Rob - I've gotten tired of
>>  >reminding him and others that the DL methodology is only one of the
>>  >ways OWL can be used (and in practice, it's not even the most common
>>  >- most OWL out there falls in Full, not DL) - it also has the problem
>>  >it is not yet scaleable to some of the largest Lite/DL ontologies out
>>  >there, and these are precisely the ones I want to access via query
>>  >instead of "document" (since the documents can get huge and take a
>>  >long hours to download, parse  and classify).  Tools that will admit
>>  >to the reality of the world out there and help people process it will
>>  >be quite welcome
>>  >  -JH
>>  >p.s. Note, this is nothing against using DL when appropriate, as in
>>  >many of NI's business uses, but just to make it clear that DL is only
>>  >one of many ways OWL is being used, and it CANNOT be the defining
>>  >restriction for all use cases and applicability ... oops, I'm
>>  >starting to get passionate and use uppercase - I'll stop now...
>>  >
>>  >
>>  >At 10:26 -0400 8/3/04, Farrukh Najmi wrote:
>>  >>Rob Shearer wrote:
>>  >>
>>  >>>Greetings, Farrukh!
>>  >>>
>>  >>>Apologies for not initiating contact myself.
>>  >>>
>>  >>>Your use case came up at the face-to-face, and I was
>>  curious whether
>>  >>>there were alternative ways to achieve the results you
>>  were trying to
>>  >>>get.
>>  >>>
>>  >>>You suggest a method of "query refinement" to select the
>>  elements of
>>  >an
>>  >>>ontology in which you're interested: first do a general query, then
>>  >add
>>  >>>a few more qualifying predicates, then add a few more, each time
>>  >taking
>>  >>>a look at the result set and figuring out what to add to
>>  prune out the
>>  >>>results in which you're not interested. (Please correct the most
>>  >>>offensive bits of this crude summary.)
>>  >>>
>>  >>>In traditional description logics systems, the process of "concept
>>  >>>refinement" is most commonly implemented by traversing a concept
>>  >>>taxonomy using not just "subclass"-style edges, but rather
>>  >>>"direct-subclass" relationships. For example, a taxonomy
>>  of "Worker",
>>  >>>"White-Collar Worker", and "Accountant" would include both
>>  >"White-Collar
>>  >>>Worker" and "Accountant" as subclasses of "Worker", however only
>>  >>>"White-Collar Worker" would be a direct subclass.
>>  >>>
>>  >>>The common use pattern would be a user interested in
>>  "Worker", so the
>>  >>>user asks for the direct subs of worker and finds that
>>  they are "White
>>  >>>Collar", "Blue Collar", "Service", and "Military". He can
>>  then drill
>>  >>>down on whichever of these he wishes, each time getting a
>>  fairly small
>>  >>>and easily-consumed result set. This is usually much
>>  easier to manage
>>  >>>than trying to figure out how to refine hundreds, thousands, or
>>  >millions
>>  >>>of results by hand somehow.
>>  >>>
>>  >>>Is any approach along these lines applicable to your use case?
>>  >>I totally agree with sub-class refinement as the most
>>  common narrowing
>>  >>technique.
>>  >>
>>  >>The use case envision the query to have zero or more parameters. Any
>>  >one
>>  >>of the parameters
>>  >>MAY be a Concept in a taxonomy (or a class in an Ontology).
>>  >>
>>  >>This is implied but not stated in the use case as I was
>>  trying to have
>>  >a
>>  >>minimalistic
>>  >>description that was easy to follow and conveyed the core use case.
>>  >>
>>  >>If you would like to propose a modified version to the use case text
>>  >>send me a draft and
>>  >>we can try and reach closure on the issue before the next
>>  DAWG meeting
>>  >if
>>  >>possible.
>>  >>
>>  >>
>>  >>--
>>  >>Regards,
>>  >>Farrukh
>>  >
>>  >--
>>  >Professor James Hendler
>>  >http://www.cs.umd.edu/users/hendler
>>  >Director, Semantic Web and Agent Technologies    301-405-2696
>>  >Maryland Information and Network Dynamics Lab.     301-405-6707 (Fax)
>>  >Univ of Maryland, College Park, MD 20742
>>
>>  --
>>  Professor James Hendler
>>  http://www.cs.umd.edu/users/hendler
>>  Director, Semantic Web and Agent Technologies       301-405-2696
>>  Maryland Information and Network Dynamics Lab.      301-405-6707 (Fax)
>>  Univ of Maryland, College Park, MD 20742
>>
>>

--
Professor James Hendler                   http://www.cs.umd.edu/users/hendler
Director, Semantic Web and Agent Technologies     301-405-2696
Maryland Information and Network Dynamics Lab.    301-405-6707 (Fax)
Univ of Maryland, College Park, MD 20742
Received on Tuesday, 3 August 2004 19:10:11 UTC