Re: [BIONT-DSE] Inclusion versus exclusion criteria

On Sep 12, 2007, at 4:30 PM, Kashyap, Vipul wrote:
[snip]

>> In terms of whether you can do this using SQL querying
>> alone, based on our experience, its unlikely.  The problem is that
>> the types of clinical exclusion and inclusion criteria we saw on
>> clinicalTrials.gov cannot be easily reduced to SQL querying (at least
>> with the structured medical records we got from Columbia).  From
>> discussions with other institutions, we know this isn't unique to
>> Columbia (i.e., there is a substantial "semantic gap" between what's
>> in the structured record and what is being queried by investigators
>> for clinical trials).
>> this information.
>
> [VK] It will be great if you could share specific examples of some  
> criteria that
> were not expressible in SQL. We can then incorporate those into the  
> use
> case and help make a case for SW technologies. On the other hand,  
> taking a quick
> look at the SHER project at IBM, looks like you are using a  
> polynomial time
> reasoner (CEL) for the matching. I may be mistaken, but my initial  
> sense is that
> any CEL expression is likely expressed in SQL/Relational Algebra or  
> vice versa.

Kavitha already pointed out that they aren't using CEL, however, at  
least the SNOMED part is in EL++ can be reasoned with using CEL or  
the like. However, that doesn't mean you can use (in any sensible  
way) Relational Algebra.

If you look at the OWL 1.1 tractable fragments document:
	<http://www.webont.org/owl/1.1/tractable.html>

in particular the section on computational properties:
	<http://www.webont.org/owl/1.1/tractable.html#7>

In contains the following paragraph:
	"The fact that data complexity stays LOGSPACE, means that one can  
exploit relational database technology for instance checking and  
conjunctive query answering.The fact that data complexity goes beyond  
LOGSPACE means that query answering and instance checking require  
more powerful engines than the ones provided by relational database  
technologies. PTIME-hardness essentially requires Datalog  
technologies. For the CoNP cases, Disjunctive Datalog technologies  
could be adopted."

The data complexity of EL++ suggest strongly that a sensible  
reduction to SQL is unlikely (i.e., you'll need datalogesque rules as  
well).

Even logspace data complex logics can be tricky. The DL-Lite family  
is the paramount example and they can have an exponential blowup in  
the size of the query (since they need to intern parts of the tbox in  
the query, so each conjunct might expand, and then the permutations  
of the expansions must be added to the union of queries...er...as I  
recall :))

So, basically, large queries with large, connectd TBoxes will be  
challenging, requiring clever optimization of the rewriting. This  
isn't something you'll do by hand ;)

Cheers,
Bijan.

Received on Thursday, 13 September 2007 08:11:13 UTC