Re: [BIONT-DSE] Inclusion versus exclusion criteria from Chimezie Ogbuji on 2007-09-12 (public-semweb-lifesci@w3.org from September 2007)

From: Chimezie Ogbuji <ogbujic@ccf.org>
Date: Wed, 12 Sep 2007 08:52:00 -0400
To: wangxiao@musc.edu
cc: public-semweb-lifesci@w3.org, "Alan Ruttenberg" <alanruttenberg@gmail.com>, "Vipul Kashyap" <VKASHYAP1@partners.org>, "Andersson, Bo H" <Bo.H.Andersson@astrazeneca.com>, "Landen Bain" <lbain@topsailtech.com>, "Rachel Richesson" <Rachel.Richesson@epi.usf.edu>, public-hcls-dse@w3.org, "Stanley Huff" <Stan.Huff@intermountainmail.org>, "Yan Heras" <Yan.Heras@intermountainmail.org>, "Oniki, Tom (GE Healthcare, consultant)" <Tom.Oniki@ge.com>, "Joey Coyle" <joey@xcoyle.com>, "Bron W. Kisler" <bkisler@earthlink.net>, "Ida Sim" <sim@medicine.ucsf.edu>
Message-ID: <1189601520.884.32.camel@otherland>

On Wed, 2007-09-12 at 09:31 +0100, Xiaoshu Wang wrote:
> You SHOULD not choose and you have to use open world reasoning because 
> how someone can tell which part of the world is closed and which part is 
> not.

Sorry, Xiaoshu, but I don't agree that you *have* to use open world
reasoning.  That suggests that there is an underlying assumption that
medical record content (or bioinformatic content in general) is a
natural fit for monotonic logics such as OWL.  This is not entirely the
case, and as we know medical record content is often inconsistent.

The choice isn't binary.  You *can* store your data as RDF, perform
monotonic inference (and querying matching) where it makes sense to
*and* perform non-monotonic inference (i.e., default negation/negation
as failure/closed world assumptions) where it make sense to as well.  

Both of the RDF querying languages I am familiar with (and use on a
daily basis to match against medical record datasets) do not require
entailment (Versa and SPARQL).  Default negation (closed world
assumptions) only come into play where the query asks for the absence of
an assertion and where logical entailment is understood to apply.

In SPARQL, the combined use of FILTER/!/BOUND effectively gives you a
mechanism for matching records with non-monotonic mechanisms without an
entailment regime.  This is how we are able to *explicitly* ask for the
absence of an assertion based only on what the RDF dataset has in
persistence.

> > For example, in pharmacy data, if the patient record does not mention 
> > a drug, we can be reasonably sure that the patient is not on that drug 
> > -- a case for closed world reasoning, whereas for other datasets such 
> > as lab or radiology, often things are explicitly asserted to be 
> > negative if not present, for example, negative MRSA results, hence 
> > requiring an open world reasoning approach.
> Let's use you example.  According to your logic, if someone says that
> 
> _:someone a pha:Patient;
>                pha:medicine pha:aspirin.
> 
> It triggers a closed world reasoning so that no more properties exist.  

This is not true.  In a CWA scenario, where you don't have an assertion
P(X,Y), you imply ~ P(X,Y) where it is understood that the '~' is a
different operator than the 'classic' *not* operator used in OWL and
monotonic logics.

I think a proper definition of scoped negation as failure would help
show how SPARQL can be used to match the absence of an assertion against
an RDF dataset that can also be subject to open world assumption s at
the same time:

[[
Related to the notion of scoped inference is an extension of the concept
of default negation, called scoped default negation.  The idea is that
the default negation inference rule must also be performed within the
scope of an explicitly specified knowledge base. That is, not q is said
to be true with respect to a knowledge base K if q is not derivable from
K.
]] -- "A Realistic Architecture for the Semantic Web" [1]

In the case of SPARQL, the RDF dataset is the knowledge base.  So, you
can have an open world 'view' on the triples above while at the same
time ask questions such as:

SELECT ?otherMeds
WHERE {
  ?patient a pha:Patient.
  OPTIONAL { 
    ?patient pha:medicine pha:betablocker;
       pha:medicalRecordNumber ?no
  }
  FILTER (!BOUND(?no))
}

To explicitly match the absence of an assertion without having to do
convoluted things like introduce an epistemic operator to OWL, etc..  

I think the most important first question is if entailment is necessary
at the point of query.  If it isn't, then you don't necessarily have a
OWA/CWA conflict.  We've been able to get pretty decent mileage out of
entailment-free SPARQL evaluation.  However, this does not prevent us
from 1) performing monotonic DL-inference and 2) using rules where the
expressiveness of OWL is insufficient over the *same* dataset.

> But do you mean that _:someone does not have a birthday or doesn't have 
> a name either?  I sincerely doubt that is what you want.

This is an unfair characterization.  In most non-monotonic KBs, you
explicitly assert what you know (and what is relevant for the questions
you most likely will ask) and leave the derivation of negation to the
default negation rule.  So, at the point of query, you will have an idea
of what is explicitly asserted.

Open world assumption works in an web environment where you may not know
what is explicitly asserted, but I don't think medical records
(especially curated medical records) should be thought of in the same
way.  They are typically populated via very controlled mechanisms and
are subject to various policies over the nature of the content
(especially where the data feeds research).

> If you want to imply specifically that there is no more
> pha:medicine,  
> you should design your ontology accordingly. 
>  For instance, making the 
> pha:medicine to range over an rdf:List.  Or design another property
> say, 

... or use a closed, value partition [2]

> pha:numOfMedicine and uses rules to suggest that the numOfMedicine
> must 
> be consistent with the pha:medicine applied to a given person.
> But do not embed closed world reasoning into your ontology.
> Otherwise, 
> you break the foundation of RDF.

The suggestion (at least mine) isn't to 'embed' CWAs in an ontology (in
fact this is not possible given the nature of OWL), but rather to allow
a scenario where you can use either assumption when appropriate.  The
tools we have at our disposal allow us to have our cake and eat it too.

[1] ftp://ftp.cs.sunysb.edu/pub/TechReports/kifer/msa-ruleml05.pdf
[2] http://www.w3.org/TR/swbp-specified-values/

-- 
Chimezie Ogbuji
Lead Systems Analyst
Thoracic and Cardiovascular Surgery
Cleveland Clinic Foundation
9500 Euclid Avenue/ W26
Cleveland, Ohio 44195
Office: (216)444-8593
ogbujic@ccf.org

===================================

Cleveland Clinic is ranked one of the top hospitals
in America by U.S. News & World Report (2007).  
Visit us online at http://www.clevelandclinic.org for
a complete listing of our services, staff and
locations.

Confidentiality Note:  This message is intended for use
only by the individual or entity to which it is addressed
and may contain information that is privileged,
confidential, and exempt from disclosure under applicable
law.  If the reader of this message is not the intended
recipient or the employee or agent responsible for
delivering the message to the intended recipient, you are
hereby notified that any dissemination, distribution or
copying of this communication is strictly prohibited.  If
you have received this communication in error,  please
contact the sender immediately and destroy the material in
its entirety, whether electronic or hard copy.  Thank you.

Received on Wednesday, 12 September 2007 12:52:43 UTC