Re: HCLS Demo at ISMB/ECCB, How to contribute to the demo? from Alan Ruttenberg on 2007-08-14 (public-semweb-lifesci@w3.org from August 2007)

From: Alan Ruttenberg <alanruttenberg@gmail.com>
Date: Mon, 13 Aug 2007 22:15:56 -0400
To: Marco Brandizi <brandizi@ebi.ac.uk>
Cc: Eric Neumann <eneumann@teranode.com>, W3C HCLSIG hcls <public-semweb-lifesci@w3.org>, "M. Scott Marshall" <marshall@science.uva.nl>, Helen Parkinson <parkinson@ebi.ac.uk>, James Malone <malone@ebi.ac.uk>
Message-Id: <F97ED399-EA56-46A9-B4F6-B746F7AC921D@gmail.com>
[Helen, James: Marco asks about representing Array Express expression  
data for inclusion in the framework of our Semantic Web demo.]

Hi Marco,

This is definitely something we are interested in doing. There are a  
number of aspects of this problem - what are you interested in - some  
examples are:

1) Representing the information about the samples, experiment,  
protocols leading to the hybridization, technical aspects of the  
hybridization, etc.
2) Representing what the computed intensity of the spots on an array,  
as well as how those were computed (e.g. MAS5, rma, d-chip, etc)
3) Representing which genes are thought to be relatively highly  
expressed by interpreting the intensity of the spots as amount of  
expression of certain genes.

The first listed is one of the motivations for OBI (http:// 
obi.sourceforge.net/) Helen Parkinson and James Malone work on  
microarray informatics and curation at the EBI. I'm ccing them on  
this email.

I've split apart the second and third based on experience that says  
that the relationship between spot and gene is not always  
straightforward and is best represented as parts in the  
representation. For 2, aspects of the data treatment fall in the  
scope of OBI, and I recommend chatting with Helen/James about them.  
For other part there is some work on either representing or  
incorporating information about the probes  (do I remember correctly  
that affymetrix does have some RDF?), and then there is a choice of  
how to represent the numbers associated with each spot. At least  
Jonathan Rees and I could talk about that, although there are others  
here as well.

Finally, for 3 there is the matter of recording which probes are  
believed to be associated with which genes, and then representing the  
mapping. Our mapping of orthology might be a guide there. You could  
choose to represent all genes that have probes on the chip, or choose  
an approach more along the line that the Allen Institute for Brain  
Science uses - choosing some number of the most expressed genes, as a  
summary. AIBS has XML for such summaries, and one project that is on  
the queue is to represent those. See, e.g.,  http://www.brain-map.org/ 
mouse/gene/browserXml.html for the top level - http://www.brain- 
map.org/mouse/Hypothalamus/GeneExpression/1.xml list (along with a  
bunch of auxiliary information) the top genes expressed in the  
Hypothalamus.

A key part of the exercise is getting clear on exactly what you want  
to be saying with the RDF, and what sort of questions you want to be  
answering. The first part is up to you. I'd recommend thinking about  
the data and constructing an english sentence that expresses what you  
think the content of the data is.  We can work from there on the  
mechanics of translating it to RDF/OWL. To get some experience with  
questions and how they are formulated in the demo, have a look at  
http://esw.w3.org/topic/HCLS/HCLSIG_Demo_QueryScratch.

One thing to think about is that where as most standalone databases  
and web sites integrate a lot of external information, in order to  
supply their users with adequate information to interpret what they  
are getting, in our scenario we are integrating the primary  
resources, and so don't want to integrate other people's integrated  
data. So thought should be given to what information a particular  
source, for example Array Express, uniquely provides, and focus  
attention on representing that information, with the expectation that  
it will be able to reunited with the other data somewhere on the  
Semantic Web.

Well, hope that helps get you started. This forum is a perfect place  
to ask followup questions or bounce ideas off of people, so don't  
hesitate to use it.

Regards,
Alan


On Aug 13, 2007, at 1:31 PM, Marco Brandizi wrote:

>
> Eric Neumann wrote:
>> I'll also add that there were many (young) researchers wanting to  
>> get involved in Semantic Web activities. I strongly encouraged  
>> them to participate with HCLSIG and pointed them to our pages and  
>> mailing list.
>
> Hi all,
>
> I'd like as well make my congratulations to Eric for his  
> presentation. I am one of those who expressed interest in  
> collaboration.
>
> Eric, during his presentation briefly mentioned that it should be  
> relatively easy to "cook" some data one may have available in non- 
> RDF format, so that they may be integrated in the demo. My idea is  
> to experiment the export of gene expression data available in  
> public repositories (mainly ArrayExpress). At the moment I am  
> trying to review ISMB materials and I wonder if there are pointers,  
> on the wiki or somewhere else, about this point. Something like a  
> brief tutorial, that  could guide me from choosing proper  
> ontologies which are already used by the demo, to using the  
> technology the demo is using too, to getting some simple result.
>
> Thanks in advance for any help.
>
>
> -- 
>
> ====================================================================== 
> =========
> Marco Brandizi <brandizi@ebi.ac.uk>
>
> NET Project - Software Engineer
> http://www.ebi.ac.uk/net-project
>
> European Bioinformatics Institute
> Hinxton, CB10 1SD, United Kingdom
> Tel.: +44 (0)1223 49 2613
> Fax: +44 (0)1223 49 4468
>
> http://www.ebi.ac.uk/~brandizi
>
Received on Tuesday, 14 August 2007 02:16:08 UTC