Re: Playing with sets in OWL... from Andrea Splendiani on 2006-09-15 (public-semweb-lifesci@w3.org from September 2006)

From: Andrea Splendiani <andrea@pasteur.fr>
Date: Fri, 15 Sep 2006 17:11:20 +0200
To: William Bug <William.Bug@DrexelMed.edu>
Cc: Alan Ruttenberg <alanruttenberg@gmail.com>, "Miller, Michael D (Rosetta)" <Michael_Miller@Rosettabio.com>, Marco Brandizi <brandizi@ebi.ac.uk>, semantic-web <semantic-web@w3.org>, public-semweb-lifesci@w3.org
Message-Id: <A81000E7-DA88-41A5-9FD7-305BE124D436@pasteur.fr>

Late post...

there may be a limit in RDF/OWL here... in that microarray (as other  
information) is not "digital". That is, it doesn't really fit the  
assumption that everything you are talking about has a true/false  
property.
In this thread, talking about gene sets, there is always the property  
(expressedIn). But expressed as a yes/no is a deduction. In theory,  
this deduction would not a starting point for inference, but rather a  
result of all information available.

I mean, I think OWL/RDF to "interpret" data is very useful, but there  
are some limitations to be aware of.

best,
Andrea

> 	2) I think the use of OWL Alan describes here is going to be  
> critical to performing broad field, large scale re-analysis of  
> complex data sets such as microarray experiments and various types  
> of neuro-images containing segmented geometric objects (in many  
> ways equivalent to the segmentation performed on microarray images  
> to determine the location and intensity of spots).  The tendency  
> when presenting these results in research articles - and often when  
> sharing the data - is to provide the analyzed/reduced view of the  
> data.  In the context of these complex experiments, many forms of  
> re-analysis will not be possible without access to the originally  
> collected data.  Think of how critical BLAST-based meta-analysis  
> was for GeneBank through the 1990s (and still is).  There are  
> several underlying assertions making it possible to perform such  
> analysis.  Primary among them is the acceptance that each form of  
> sequencing technology provides a reliable way of determining the  
> probability of finding a particular nucleotide at a particular  
> location.  Many sequences are submitted with the simple assertion  
> that at position N in sequence X there is a 100% probability (or  
> 95% confidence, to be more specific) of finding nucleotide A|T|G| 
> C.  To some extent, the statistical analysis performed by BLAST  
> (and other position-sensitive, cross-correlative statistical  
> algorithms) relied on these "ground facts".  For the most part, it  
> was safe to assume this level of reduced data could be safely  
> pooled with other such sequence determinations regardless of the  
> specific sequencing device, underlying biochemical protocols, and  
> specific lots of reagents used.  These same assumptions can not  
> generally be safely assumed for microarray experiments, segmented  
> MRI images - and many other types of images such as IHC or in situ  
> based images.  As an example, just look to the debates in the last  
> year or two regarding the sometimes problematic nature of  
> replicating "gene expression" level results with different arrays  
> covering the "same" genes.  If we are to support the same sort of  
> meta analysis as was common with BLAST across GenBank sequences,  
> then we will have to often supply access to the low level data  
> elements.  This in fact was a major impetus behind providing the  
> MAGE-OM (and FuGE-OM).  As I state at the top of this email with  
> points 'a', 'b', & 'c', MAGE-OM/MAGE-ML is extremely useful for  
> several critical tasks related to the handling of this detailed  
> data.  When it comes to supporting the semantically-grounded  
> analytical requirements of such complex, broad field, meta- 
> analysis, however, I think OWL (and sometimes RDF alone) is going  
> to prove a critical enabling technology.

Received on Friday, 15 September 2006 15:11:52 UTC