- From: Bijan Parsia <bparsia@cs.man.ac.uk>
- Date: Thu, 13 Sep 2007 09:10:34 +0100
- To: "Kashyap, Vipul" <VKASHYAP1@PARTNERS.ORG>
- Cc: "Kavitha Srinivas" <ksrinivs@gmail.com>, <wangxiao@musc.edu>, "Alan Ruttenberg" <alanruttenberg@gmail.com>, "Andersson, Bo H" <Bo.H.Andersson@astrazeneca.com>, "Landen Bain" <lbain@topsailtech.com>, "Rachel Richesson" <Rachel.Richesson@epi.usf.edu>, "public-semweb-lifesci hcls" <public-semweb-lifesci@w3.org>, <public-hcls-dse@w3.org>, "Stanley Huff" <Stan.Huff@intermountainmail.org>, "Yan Heras" <Yan.Heras@intermountainmail.org>, "Oniki, Tom (GE Healthcare, consultant)" <Tom.Oniki@ge.com>, "Joey Coyle" <joey@xcoyle.com>, "Bron W. Kisler" <bkisler@earthlink.net>, "Ida Sim" <sim@medicine.ucsf.edu>
On Sep 12, 2007, at 4:30 PM, Kashyap, Vipul wrote: [snip] >> In terms of whether you can do this using SQL querying >> alone, based on our experience, its unlikely. The problem is that >> the types of clinical exclusion and inclusion criteria we saw on >> clinicalTrials.gov cannot be easily reduced to SQL querying (at least >> with the structured medical records we got from Columbia). From >> discussions with other institutions, we know this isn't unique to >> Columbia (i.e., there is a substantial "semantic gap" between what's >> in the structured record and what is being queried by investigators >> for clinical trials). >> this information. > > [VK] It will be great if you could share specific examples of some > criteria that > were not expressible in SQL. We can then incorporate those into the > use > case and help make a case for SW technologies. On the other hand, > taking a quick > look at the SHER project at IBM, looks like you are using a > polynomial time > reasoner (CEL) for the matching. I may be mistaken, but my initial > sense is that > any CEL expression is likely expressed in SQL/Relational Algebra or > vice versa. Kavitha already pointed out that they aren't using CEL, however, at least the SNOMED part is in EL++ can be reasoned with using CEL or the like. However, that doesn't mean you can use (in any sensible way) Relational Algebra. If you look at the OWL 1.1 tractable fragments document: <http://www.webont.org/owl/1.1/tractable.html> in particular the section on computational properties: <http://www.webont.org/owl/1.1/tractable.html#7> In contains the following paragraph: "The fact that data complexity stays LOGSPACE, means that one can exploit relational database technology for instance checking and conjunctive query answering.The fact that data complexity goes beyond LOGSPACE means that query answering and instance checking require more powerful engines than the ones provided by relational database technologies. PTIME-hardness essentially requires Datalog technologies. For the CoNP cases, Disjunctive Datalog technologies could be adopted." The data complexity of EL++ suggest strongly that a sensible reduction to SQL is unlikely (i.e., you'll need datalogesque rules as well). Even logspace data complex logics can be tricky. The DL-Lite family is the paramount example and they can have an exponential blowup in the size of the query (since they need to intern parts of the tbox in the query, so each conjunct might expand, and then the permutations of the expansions must be added to the union of queries...er...as I recall :)) So, basically, large queries with large, connectd TBoxes will be challenging, requiring clever optimization of the rewriting. This isn't something you'll do by hand ;) Cheers, Bijan.
Received on Thursday, 13 September 2007 08:11:13 UTC