- From: Andrea Splendiani <andrea@pasteur.fr>
- Date: Fri, 15 Sep 2006 17:11:20 +0200
- To: William Bug <William.Bug@DrexelMed.edu>
- Cc: Alan Ruttenberg <alanruttenberg@gmail.com>, "Miller, Michael D (Rosetta)" <Michael_Miller@Rosettabio.com>, Marco Brandizi <brandizi@ebi.ac.uk>, semantic-web <semantic-web@w3.org>, public-semweb-lifesci@w3.org
Late post... there may be a limit in RDF/OWL here... in that microarray (as other information) is not "digital". That is, it doesn't really fit the assumption that everything you are talking about has a true/false property. In this thread, talking about gene sets, there is always the property (expressedIn). But expressed as a yes/no is a deduction. In theory, this deduction would not a starting point for inference, but rather a result of all information available. I mean, I think OWL/RDF to "interpret" data is very useful, but there are some limitations to be aware of. best, Andrea > 2) I think the use of OWL Alan describes here is going to be > critical to performing broad field, large scale re-analysis of > complex data sets such as microarray experiments and various types > of neuro-images containing segmented geometric objects (in many > ways equivalent to the segmentation performed on microarray images > to determine the location and intensity of spots). The tendency > when presenting these results in research articles - and often when > sharing the data - is to provide the analyzed/reduced view of the > data. In the context of these complex experiments, many forms of > re-analysis will not be possible without access to the originally > collected data. Think of how critical BLAST-based meta-analysis > was for GeneBank through the 1990s (and still is). There are > several underlying assertions making it possible to perform such > analysis. Primary among them is the acceptance that each form of > sequencing technology provides a reliable way of determining the > probability of finding a particular nucleotide at a particular > location. Many sequences are submitted with the simple assertion > that at position N in sequence X there is a 100% probability (or > 95% confidence, to be more specific) of finding nucleotide A|T|G| > C. To some extent, the statistical analysis performed by BLAST > (and other position-sensitive, cross-correlative statistical > algorithms) relied on these "ground facts". For the most part, it > was safe to assume this level of reduced data could be safely > pooled with other such sequence determinations regardless of the > specific sequencing device, underlying biochemical protocols, and > specific lots of reagents used. These same assumptions can not > generally be safely assumed for microarray experiments, segmented > MRI images - and many other types of images such as IHC or in situ > based images. As an example, just look to the debates in the last > year or two regarding the sometimes problematic nature of > replicating "gene expression" level results with different arrays > covering the "same" genes. If we are to support the same sort of > meta analysis as was common with BLAST across GenBank sequences, > then we will have to often supply access to the low level data > elements. This in fact was a major impetus behind providing the > MAGE-OM (and FuGE-OM). As I state at the top of this email with > points 'a', 'b', & 'c', MAGE-OM/MAGE-ML is extremely useful for > several critical tasks related to the handling of this detailed > data. When it comes to supporting the semantically-grounded > analytical requirements of such complex, broad field, meta- > analysis, however, I think OWL (and sometimes RDF alone) is going > to prove a critical enabling technology.
Received on Friday, 15 September 2006 15:11:49 UTC