- From: Alan Ruttenberg <alanruttenberg@gmail.com>
- Date: Mon, 13 Aug 2007 22:15:56 -0400
- To: Marco Brandizi <brandizi@ebi.ac.uk>
- Cc: Eric Neumann <eneumann@teranode.com>, W3C HCLSIG hcls <public-semweb-lifesci@w3.org>, "M. Scott Marshall" <marshall@science.uva.nl>, Helen Parkinson <parkinson@ebi.ac.uk>, James Malone <malone@ebi.ac.uk>
[Helen, James: Marco asks about representing Array Express expression data for inclusion in the framework of our Semantic Web demo.] Hi Marco, This is definitely something we are interested in doing. There are a number of aspects of this problem - what are you interested in - some examples are: 1) Representing the information about the samples, experiment, protocols leading to the hybridization, technical aspects of the hybridization, etc. 2) Representing what the computed intensity of the spots on an array, as well as how those were computed (e.g. MAS5, rma, d-chip, etc) 3) Representing which genes are thought to be relatively highly expressed by interpreting the intensity of the spots as amount of expression of certain genes. The first listed is one of the motivations for OBI (http:// obi.sourceforge.net/) Helen Parkinson and James Malone work on microarray informatics and curation at the EBI. I'm ccing them on this email. I've split apart the second and third based on experience that says that the relationship between spot and gene is not always straightforward and is best represented as parts in the representation. For 2, aspects of the data treatment fall in the scope of OBI, and I recommend chatting with Helen/James about them. For other part there is some work on either representing or incorporating information about the probes (do I remember correctly that affymetrix does have some RDF?), and then there is a choice of how to represent the numbers associated with each spot. At least Jonathan Rees and I could talk about that, although there are others here as well. Finally, for 3 there is the matter of recording which probes are believed to be associated with which genes, and then representing the mapping. Our mapping of orthology might be a guide there. You could choose to represent all genes that have probes on the chip, or choose an approach more along the line that the Allen Institute for Brain Science uses - choosing some number of the most expressed genes, as a summary. AIBS has XML for such summaries, and one project that is on the queue is to represent those. See, e.g., http://www.brain-map.org/ mouse/gene/browserXml.html for the top level - http://www.brain- map.org/mouse/Hypothalamus/GeneExpression/1.xml list (along with a bunch of auxiliary information) the top genes expressed in the Hypothalamus. A key part of the exercise is getting clear on exactly what you want to be saying with the RDF, and what sort of questions you want to be answering. The first part is up to you. I'd recommend thinking about the data and constructing an english sentence that expresses what you think the content of the data is. We can work from there on the mechanics of translating it to RDF/OWL. To get some experience with questions and how they are formulated in the demo, have a look at http://esw.w3.org/topic/HCLS/HCLSIG_Demo_QueryScratch. One thing to think about is that where as most standalone databases and web sites integrate a lot of external information, in order to supply their users with adequate information to interpret what they are getting, in our scenario we are integrating the primary resources, and so don't want to integrate other people's integrated data. So thought should be given to what information a particular source, for example Array Express, uniquely provides, and focus attention on representing that information, with the expectation that it will be able to reunited with the other data somewhere on the Semantic Web. Well, hope that helps get you started. This forum is a perfect place to ask followup questions or bounce ideas off of people, so don't hesitate to use it. Regards, Alan On Aug 13, 2007, at 1:31 PM, Marco Brandizi wrote: > > Eric Neumann wrote: >> I'll also add that there were many (young) researchers wanting to >> get involved in Semantic Web activities. I strongly encouraged >> them to participate with HCLSIG and pointed them to our pages and >> mailing list. > > Hi all, > > I'd like as well make my congratulations to Eric for his > presentation. I am one of those who expressed interest in > collaboration. > > Eric, during his presentation briefly mentioned that it should be > relatively easy to "cook" some data one may have available in non- > RDF format, so that they may be integrated in the demo. My idea is > to experiment the export of gene expression data available in > public repositories (mainly ArrayExpress). At the moment I am > trying to review ISMB materials and I wonder if there are pointers, > on the wiki or somewhere else, about this point. Something like a > brief tutorial, that could guide me from choosing proper > ontologies which are already used by the demo, to using the > technology the demo is using too, to getting some simple result. > > Thanks in advance for any help. > > > -- > > ====================================================================== > ========= > Marco Brandizi <brandizi@ebi.ac.uk> > > NET Project - Software Engineer > http://www.ebi.ac.uk/net-project > > European Bioinformatics Institute > Hinxton, CB10 1SD, United Kingdom > Tel.: +44 (0)1223 49 2613 > Fax: +44 (0)1223 49 4468 > > http://www.ebi.ac.uk/~brandizi >
Received on Tuesday, 14 August 2007 02:16:08 UTC