Re: Representing conflicting evidence and refutation from Pavel Klinov on 2010-10-14 (public-semweb-lifesci@w3.org from October 2010)

From: Pavel Klinov <pklinov@cs.man.ac.uk>
Date: Thu, 14 Oct 2010 14:38:04 +0100
To: Matthias Samwald <samwald@gmx.at>
Cc: Joanne Luciano <jluciano@cs.rpi.edu>, "M. Scott Marshall" <mscottmarshall@gmail.com>, HCLS <public-semweb-lifesci@w3.org>, Andrey Rzhetsky <arzhetsk@medicine.bsd.uchicago.edu>, Deborah McGuinness <dlm@cs.rpi.edu>, Jim McCusker <james.mccusker@yale.edu>, Dominic DiFranzo <difrad@rpi.edu>, divoli@uchicago.edu, Bijan Parsia <bparsia@cs.man.ac.uk>
Message-ID: <AANLkTikGgKYdAz-GRx-zP8LFC-4YOboTT1WUsDSB_98u@mail.gmail.com>
On Thu, Oct 14, 2010 at 10:54 AM, Matthias Samwald <samwald@gmx.at> wrote:
> Dear all,
>
> Interesting connections -- one of the main developers of the CADIAG-2 system
> (Prof. Klaus-Peter Adlassnig) was my PhD supervisor. While I was mainly
> interested in crisp reasoning back then, I also played around with
> fuzzy/probabilistic OWL reasoning. However, the scalability of these
> reasoners (such as Pronto) seemed to be very limited back then (two years
> ago). It is exciting to hear that this situation has seemingly improved in
> the meantime.

Yeah, back in 2008 we were only able to handle something like 15-20
statements in a knowledge base. Now we can automatically check
consistency for 1000-1500 probabilistic statements (also depending on
other factors).

>
> The trend towards personalized medicine could generate some very interesting
> use-cases, since the information about effects of certain SNPs, molecular
> pathway alterations, lifestyle and demographic factors are still quite
> incomplete and sometimes contradictory. One could extract the relevant data
> for each disease/drug and each patient to yield a knowledge base fragment of
> a manageable size, and then try to judge disease risk, drug efficacy or risk
> of adverse events based on imprecise reasoning.

That'd be very interesting. Certainly probabilistic extensions to OWL
isn't a silver bullet. For instance, you probably shouldn't expect
being able to assess, say, breast cancer risk by doing pure
probabilistic entailment from your knowledge base. Instead, you may
try to use it to represent a background theory (or a part thereof)
breast cancer, which i) provides the necessary medical vocabulary (via
OWL) and ii) captures some important statistics (e.g. what fraction of
patients exposed to *A* develops *B*). Such a theory could be a basis
for more high level (or specialized) tools, e.g. proportional hazard
models, etc.

I don't know how much I'll be available in the next couple of months
(since I'm trying to finish my thesis write-up). I'm cc'ing my
advisor, Bijan Parsia, since we'd both love to keep this line of
research going.

Just in case anyone wants to have a look at our CADIAG-2 analysis
paper, it's here: http://www.logic.at/WWTF016/pdf/LPAR.pdf

Thanks,
Pavel Klinov

>
> Cheers,
> Matthias Samwald
>
> // DERI Galway, Ireland
> // Konrad Lorenz Institute for Evolution and Cognition Research, Austria
> // http://samwald.info
>
>
>
> --------------------------------------------------
> From: "Joanne Luciano" <jluciano@cs.rpi.edu>
> Sent: Wednesday, October 13, 2010 4:28 PM
> To: "M. Scott Marshall" <mscottmarshall@gmail.com>
> Cc: "HCLS" <public-semweb-lifesci@w3.org>; "Andrey Rzhetsky"
> <arzhetsk@medicine.bsd.uchicago.edu>; <pklinov@cs.man.ac.uk>; "Deborah
> McGuinness" <dlm@cs.rpi.edu>; "Jim McCusker" <james.mccusker@yale.edu>;
> "Dominic DiFranzo" <difrad@rpi.edu>; <divoli@uchicago.edu>
> Subject: Re: Representing conflicting evidence and refutation
>
>> Hi Scott,
>>
>> Interesting you should being this up.  Last week when I was at  Manchester
>> I attended the DL (Description Logics) lunch talk by PhD  Student Pavel
>> Klinov, The talk was on an analysis of CADIAG-2 KB. The  aim of the project
>> is to analyze (in)consistency of CADIAG-2 -- the  large medical diagnosing
>> system developed in Vienna in the 80s. The  approach is to translate
>> CADIAG-2 into a P-SH KB and compute all (or  most of) minimal sets of
>> conflicting rules. This is a joint work with  David Picado from the
>> Technical University of Vienna, who provided the  system and developed its
>> translation into P-SH.
>>
>> Conflicting information in text is exactly what Andrey was working  with.
>> His focus was in biological pathways.
>>
>> After the talk I had a chat with Pavel and thought that there we other
>> applications of his work, but we'd need to identify some data sets.  I
>> immediately thought of Andrey Rzhetsky's work on Geneways where he addressed
>> representing complementary data. I wrote to Andrey in hopes  of getting some
>> data with inconsistencies to see what Pavel's methods  would uncover.
>>
>> I've copied both Andrey and Pavel on this email as well as a few form  the
>> TWC.
>>
>> Cheers,
>> Joanne
>>
>>
>>
>> On Oct 13, 2010, at 9:43 AM, M. Scott Marshall wrote:
>>
>>> Lilly recently halted development of of the Alzheimer's drug
>>> "semagacestat" because it was making patients worse in two late stage
>>> clinical trials. This type of knowledge seems like very valuable
>>> information to researchers in Alzheimer's. However, in recent searches
>>> of http://clinicaltrials.gov such as
>>> http://clinicaltrials.gov/ct2/results?term=semagacestat, it seems that
>>> the news hasn't been incorporated into the data on the website.
>>> However, assuming that it had been added, I am curious how 'cancelled
>>> clinical trials' can be found in the linked data. Has anyone looked at
>>> this?
>>>
>>>
>>> http://prescriptions.blogs.nytimes.com/2010/08/17/lilly-halts-alzheimers-drug-trial/?scp=2&sq=alzheimer's%20disease&st=cse
>>>
>>> Another example of contradiction/refutation, this time found in
>>> PubMed, is that Metformin apparently doesn't work (only) along the
>>> pathways that previous research indicated:
>>>
>>> "Metformin inhibits hepatic gluconeogenesis in mice independently of
>>> the LKB1/AMPK pathway via a decrease in hepatic energy state"
>>> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2898585/
>>>
>>> Has anyone seen a way to deal with conflicting information like this
>>> in text mining? If we were to represent this information in RDF, could
>>> we do it in such a way that we could observe the change of the
>>> Metformin association with LKB1/AMPK pathways over time in the
>>> literature?
>>>
>>> Cheers,
>>> Scott
>>>
>>> P.S. Oktie - It is pure coincidence that the first example is from
>>> clinical trials. :)
>>>
>>> --
>>> M. Scott Marshall, W3C HCLS IG co-chair
>>> Leiden University Medical Center / University of Amsterdam
>>> http://staff.science.uva.nl/~marshall
>>>
>>
>>
>>
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Joanne S. Luciano, PhD                   Email:   jluciano@cs.rpi.edu
>> Research Associate Professor             110 8th Street, Winslow 2143
>> Tetherless World Constellation           Troy, NY 12180, USA
>> Rensselaer Polytechnic Institute         Office Tel.  +1.518.276.4939
>> Global Tel. +1.617.440.4364 (skypeIn)    Office Fax   +1.518.276.4464
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>>
>>
>>
>



-- 
cheers,
--pavel
http://www.cs.man.ac.uk/~klinovp
Received on Thursday, 14 October 2010 13:38:46 UTC