Re: Playing with sets in OWL... from Marco Brandizi on 2006-09-10 (public-semweb-lifesci@w3.org from September 2006)

From: Marco Brandizi <brandizi@ebi.ac.uk>
Date: Sun, 10 Sep 2006 13:21:29 +0100
CC: public-semweb-lifesci@w3.org
Message-ID: <45040349.10707@ebi.ac.uk>
Hi all,

First, thank you all very much for all the interesting replies. I am not 
sure I am understanding all of them, anyway...

 > What you are describing is described in MAGE-OM/MAGE-ML, as a UML
 > model to capture the real world aspects of running a microarray
 > experiment.
 >
 > Typically at the end of this process a set of genes is identified as
 > being interesting for some reason and one wants to know more about
 > this set of genes beyond the microarray experiment that has been
 > performed.
 >
 > I might be wrong but I think that is where Marco is starting, at the
 > end of the experiment for follow-up.>
 >

Yes, I am trying to represent the experimental activity linked to 
microarray experiments, taking into account why the experiment are 
performed, what hypothesis or conclusions may be derived from data 
analysis and alike.

So I would like to represent the gene expression domain in a rather 
abstract fashion. I know there are similar projects I am interested in 
(http://lists.w3.org/Archives/Public/public-semweb-lifesci/2006Aug/0133)


 > So. with reference to this ontology (generated by Marco, or imported
 >  from some standard) he could simply state:
 >
 > Individual(c1 type(Computation) value(geneComputedAsExpressed g1)
 > value(geneComputedAsExpressed g2) value(geneComputedAsExpressed g3) )
 >

Theese were interesting examples. It seems that I should have some 
individual somehow (c1), but avoiding the replication of relations 
between sets.

 > location and intensity of spots).  The tendency when presenting these
 >  results in research articles - and often when sharing the data - is
 > to provide the analyzed/reduced view of the data.  In the context of
 > these complex experiments, many forms of re-analysis will not be
 > possible without access to the originally collected data.  Think of

Nonetheless I think that is useful to perform an operation like: "import 
these list of few genes and assign a name, description, why I am 
importing it, more formal knowledge about it... etc.".

The meaning of such lists could be something like: "after weeks of 
microarray analysis, BLASting, etc, this genes are interesting for 
me/our project/our reserach group". So the reduced lists of microarray 
items could be useful, and representing it and its meaning with semantic 
web could be useful as well, notwithstanding the fact you would still 
need the whole data set to redo more mathematical analysis. I beelive 
Semantic Web is interesting in the first case, maybe less interesting in 
the second one.

Again on this topic:

 > The following tools, for example, are available for microarray gene
 > annotation.
 >
 > SOURCE -- http://nar.oxfordjournals.org/cgi/content/full/31/1/219
 > KARMA --
 > http://nar.oxfordjournals.org/cgi/content/full/32/suppl_2/W441
 > RESOURCERER -- http://pga.tigr.org/tigr-scripts/magic/r1.pl DRAGON --
 > http://pevsnerlab.kennedykrieger.org/dragon.htm
 >
 > These tools take a gene list of interest and return annotation
 > collected from multiple sources (e.g., gene ontology, UniProt, and
 > KEGG). It might be useful if these tools can be made
 > semantic-web-aware.

Thanks to Key for reporting these tools, which I didn't know and I'll 
give a look at.

However, what these tools seem to address is official, stable, 
annotations, available to a wide number of people, with no chance for 
the user to change or enrich them. Of course such a kind of tools are 
important, but I am thinking to something that could be used in your 
research group, where one could make claims, rather than definitive 
assertions, like: "these genes are involved in this disease", or: "these 
data confirm this pathway, but I am not completely sure, more validation 
necessary"...


 > MAGE-OM/MAGE-ML is also the result of a huge amount of deliberation
 > from dozens of experts in the informatics fields involved in
 > generating, storing, and manipulating microarray data.
 >
 > When it comes to manipulating the information associated with a
 > microarray experiment - or collection of experiments - in a
 > semantically explicitly manner, however, RDF is really the preferred
 > formalism providing the required explicit semantics, while still
 > providing the expressiveness needed to characterize the inherent
 > variety, complexity, and granularity in this information.  When it

This has already been addressed here 
(http://lists.w3.org/Archives/Public/public-semweb-lifesci/2006Jun/0098)

I wonder wether the level of details that MAGE-OM is able to handle may 
efficiently be translated into RDF, worse into OWL. Beside, I wonder if 
this could be something interesting to do.

Basically:

- MAGE-OM is good enough for the representation of how a microarray 
experiment has been done (the experimental design) and which raw or 
normalize data it has produced. It has some limits due to the the fact 
it is an object model, but it may be coupled with MGED ontology or FUGO 
to face with that.

- MAGE-OM doesn't cover much the follow-up of experimentation, nor 
higher levels of abstraction, i.e.: hypothesis and conclusions, who is 
studying some disease, or genes behaviour in a given cell type, etc.

moreover...

 > bringing in information and data from a vast number of resources and
 > tying it together into big pictures, all without the semantic web.
 > I'm sure they would love to have the kind of power envisioned by the
 > W3C for the semantic web but they won't touch it until it is
 > easy--they are busy doing their core jobs.


- ...how much are RDF and OWL scalable? Let's take a small data set of 
100 microarray experiments, with 10k probe sets x 10 hybridiazations. We 
would have (at least) 10 millions numbers to handle, plus several 
annotations, plus inference, etc. An RDF backend that directly maps SQL 
to RDF should still work out, but what about an in-memory OWL reasoner? 
And what about integrating larger amounts of microarray data, crawled 
from different sources on the web (which should be a goal of the 
Semantic web)?

That's another point of current state-of-art of the Semantic Web: I am 
not an expert, but aren't we still missing some needed features? Like: 
efficient RDF handling, SQL mapping, federated data stores, distributed 
reasoning... relational tehory is less expressive, but for the moment, 
relational databases, having been here for ages, seem more reliable and 
efficient.

Sorry for the length of the reply...

Cheers.


-- 

===============================================================================
Marco Brandizi <brandizi@ebi.ac.uk>
http://gca.btbs.unimib.it/brandizi
Received on Sunday, 10 September 2006 12:21:51 UTC