Re: Evidence from Alan Ruttenberg on 2007-06-18 (public-semweb-lifesci@w3.org from June 2007)

From: Alan Ruttenberg <alanruttenberg@gmail.com>
Date: Mon, 18 Jun 2007 03:16:00 -0400
To: satya30@uga.edu
Cc: public-semweb-lifesci hcls <public-semweb-lifesci@w3.org>
Message-Id: <CDD7F563-C4FC-46D3-AF32-299A2C95028F@gmail.com>
On Jun 13, 2007, at 12:33 PM, SATYA SANKET SAHOO wrote:

>> On Jun 13, 2007, at 3:42 AM, dirk.colaert@agfa.com wrote:
>>   I am following part of this thread and feel like   popping in.  
>> Maybe it helps.
>>
>> In clinical trials and 'evidence' based medicine the word evidence  
>> is strictly defined and may not be compatible with the word  
>> 'evidence' used in logic:   if <evidence> then <conclusion>. I  
>> support the idea of connecting the interpretation of the raw data  
>> (the source data) with the data itself. Pixels cannot be evidences  
>> on their own, without knowing what the pixels mean. So, an  
>> important fact is the thrust in the interpreter.
>
> Satya: Connecting source data with result data along with the  
> processing information used to derive the results and 'trust' sound  
> very similar to 'provenance' information. Can we or not  
> differentiate between 'evidence' and 'provenance'?
>
> This is especially pertinent in case of experimental data and  
> results derived from it. For example, when a list of peptides is  
> derived from a 'biochemical sample' using mass spectrometry (ms)  
> the evidence that accompany these results are:
> 1. The details of the original sample (organism, type of cells,  
> cleavage enzyme used etc.)
> 2. The ms instrument used, the settings of the instruments and as  
> pointed out earlier in this discussion, the algorithms used in  
> processing the ms data - these entail a lot of contextual  
> information regarding how the results are processed or interpreted  
> (measure of confidence etc.)

As you recall, the demo used the Evidence Code Ontology from OBO, as  
its basis. This is a funny artifact - on the one hand, it's been  
useful, in some form, to the researchers that have used the Gene  
ontology. On the other hand, it is clearly a mixture of various kinds  
of things that don't really go well together - one of them being the  
mixture of provenance versus experimental information.

I've thought that it would be a useful exercise to start with the  
current ECO and try to refactor it, perhaps making explicit where  
appropriate the various  components that Dirk mentions as being  
elements of evidence. (I think the proxy idea is quite related to his  
view of things, btw).

As an example of what's there now, we see things like "Traceable  
Author Statement", (no definition) which I take to mean that someone  
read a paper where the author said it was so, and here is the PMID.  
TAS is generally applicable, and was what we used when all we had was  
some otherwise unexplained citation of a paper. Really it is more  
like provenance than evidence.

OTOH, there are things like: "inferred from curated BLAST match to  
protein" (no definition), which is a justification for moving GO  
annotations on proteins in one species to proteins in another.  So  
this is much more specific, has an underlying theory along with which  
comes a standard set of caveats. It can also be put into some sort of  
proxy relationships - similarity of sequence of protein is a proxy  
for similarity of function of protein. (on Dirk's scale of 1-4 this  
would probably be considered a 0)

There is some overlap of the discussion of evidence with the OBI  
protocol application branch's work. I'd say there that effort on  
determining the ingredients and their relationships, rather than  
evaluating how much to believe the evidence. There's also some  
overlap of OBI with Satya's ontology, so maybe there's a chance for  
more concentrated effort being put into a merge of these various  
independent efforts.

-Alan
Received on Monday, 18 June 2007 07:16:09 UTC