Re: [BIONT-DSE] Inclusion versus exclusion criteria from Kavitha Srinivas on 2007-09-13 (public-semweb-lifesci@w3.org from September 2007)

From: Kavitha Srinivas <ksrinivs@gmail.com>
Date: Thu, 13 Sep 2007 16:34:33 -0400
To: "Kashyap, Vipul" <VKASHYAP1@PARTNERS.ORG>
Cc: "Bijan Parsia" <bparsia@cs.man.ac.uk>, <wangxiao@musc.edu>, "Alan Ruttenberg" <alanruttenberg@gmail.com>, "Andersson, Bo H" <Bo.H.Andersson@astrazeneca.com>, "Landen Bain" <lbain@topsailtech.com>, "Rachel Richesson" <Rachel.Richesson@epi.usf.edu>, "public-semweb-lifesci hcls" <public-semweb-lifesci@w3.org>, <public-hcls-dse@w3.org>, "Stanley Huff" <Stan.Huff@intermountainmail.org>, "Yan Heras" <Yan.Heras@intermountainmail.org>, "Oniki, Tom (GE Healthcare, consultant)" <Tom.Oniki@ge.com>, "Joey Coyle" <joey@xcoyle.com>, "Bron W. Kisler" <bkisler@earthlink.net>, "Ida Sim" <sim@medicine.ucsf.edu>
Message-Id: <C15FE862-537B-4B68-A457-0FF913CC60BE@gmail.com>

> Also, what would be great is to get a concrete real world example  
> which
> illustrates the above. The example given by Kavitha, I believe has  
> a SQL
> translation. Getting such examples are crucial to showing the value  
> of the web.

Vipul, just to clarify, which example are you referring to when you  
say you can do it in SQL?  Also, about precomputing the closure of  
SNOMED, if you mean that storing the
results of the classification hierarchy of SNOMED can eliminate the  
reasoning step in query answering, we know for sure that this won't  
work for most of the queries we looked at.  The reason is that most  
queries require (at the very least) that sub-parts of the query  
actually be classified on the fly.  Let me give you a concrete  
example from our work.  Lets take the query
that is looking for patients on medications with an active ingredient  
of steroids.  In the actual instance data you have:

Patient X onMedication Vendor-specific DrugX

The first subpart of the query is patients onMedication, and that can  
be answered by looking at the actual instance data.  The second  
subpart of the query is drugs that have an active
ingredient of steroids.  This part does not map directly to any  
existing concept in SNOMED, so a precomputed classification will not  
help.

As an example, here are all the TBox assertions that are needed for  
us to find drugs with active ingredients of steroids (I am using just  
one drug case to illustrate):
1.  We know from our mappings that Vendor-specific DrugX is a  
subclass of the SNOMED concept Hydrocortisone preparation.
2.  We also know from SNOMED that a Hydrocortisone preparation drug  
which has an active ingredient of Hydrocortisone (the substance), AND  
it has a dose form of oral dosage.  In OWL
this would mean Hydrocortisone preparation is defined as equivalent  
to an intersection of 2 existentials (exists.ActiveIngredient 
(Hydrocortisone-substance) and exists.dosageForm(OralDosageForm)
3.  Hydrocortisone(substance) has superclasses Oxycortiosteroid  
(substance) and Hyroxycorticosteroid(substance), each of these  
ultimately end up with a superclass of steroids.

On Sep 13, 2007, at 8:40 AM, Kashyap, Vipul wrote:

>
>
>> The data complexity of EL++ suggest strongly that a sensible
>> reduction to SQL is unlikely (i.e., you'll need datalogesque rules as
>> well).
>
> [VK] The interesting question in my mind then is what is the  
> additional
> functionality achieved by these datalogesque rules that are not  
> present in
> SQL? The reason I ask is because today the major RDBMS vendors support
> transitive closure operations and I was wondering if there is any  
> other
> functionality that is missing in SQL.
>
> Also, what would be great is to get a concrete real world example  
> which
> illustrates the above. The example given by Kavitha, I believe has  
> a SQL
> translation. Getting such examples are crucial to showing the value  
> of the web.
>
>> Even logspace data complex logics can be tricky. The DL-Lite family
>> is the paramount example and they can have an exponential blowup in
>> the size of the query (since they need to intern parts of the tbox in
>> the query, so each conjunct might expand, and then the permutations
>> of the expansions must be added to the union of queries...er...as I
>> recall :))
>
> [VK] From a pragmatic point of view, in the context of a given  
> application,
> this just needs to be done once. There are well defined RDBMS  
> approaches to
> create views, materialize them, develop indexing structures to achieve
> scalability.
>
> For instance, I know that a common approach to using Snomed is to  
> precompute
> the "closure" and store it in a RDBMS.
>
> So, the real world has figured out ways of dealing with these  
> situations and
> I am yet to see examples of how using semantic web technologies,  
> will give them
> the scalability and make their life easier.
>
>> So, basically, large queries with large, connectd TBoxes will be
>> challenging, requiring clever optimization of the rewriting. This
>> isn't something you'll do by hand ;)
>
> [VK] Can I have some real world examples which illustrate this?
>
> ---Vipul
>
>
> The information transmitted in this electronic communication is  
> intended only for the person or entity to whom it is addressed and  
> may contain confidential and/or privileged material. Any review,  
> retransmission, dissemination or other use of or taking of any  
> action in reliance upon this information by persons or entities  
> other than the intended recipient is prohibited. If you received  
> this information in error, please contact the Compliance HelpLine  
> at 800-856-1983 and properly dispose of this information.

Received on Thursday, 13 September 2007 20:34:50 UTC