Re: RDF from Gene Atlas

Hi Scott,

Thank you forwarding this thread. We have developed a repository for  
stem cell microarray data,
and are currently in the process of exporting RDF data. We are  
planning to export only the metadata
of the experiment - not the expression data or fold change. Although  
that data also exists in a structured format
and could be also exported as RDF.

It would be a nice exercise to check the interoperability of the 2  
data repositories.
We talked to Helen P and James M, about using EFO to describe the  
biomaterial characteristics
and submitting term requests for missing ones.


Sudeshna



On Aug 11, 2010, at 3:32 PM, M. Scott Marshall wrote:

> FYI
>
> ---------- Forwarded message ----------
> From: M. Scott Marshall <mscottmarshall@gmail.com>
> Date: Mon, Aug 9, 2010 at 10:26 AM
> Subject: Fwd: RDF from Atlas
> To: Christoph Grabmueller <grabmuel@ebi.ac.uk>, Kei Cheung
> <kei.cheung@yale.edu>, Satya Sahoo <satyasahoo@gmail.com>, Matthias
> Samwald <samwald@gmx.at>, crockey@io-informatics.com
> Cc: Jun Zhao <jun.zhao@zoo.ox.ac.uk>, Helena Deus
> <helenadeus@gmail.com>, Rebholz <rebholz@ebi.ac.uk>
>
>
> Note to BioRDF members: Christoph Grabmeuller provided us with example
> microarray RDF from Rebholz's text mining group at EBI using EFO (see
> below). Notice that Christoph would like feedback and guidance. It
> could be informative to compare our approaches.
>
> Christoph,
>
> [Would you mind if I CC the HCLS mailing list "HCLS"
> <public-semweb-lifesci@w3.org> ? There are many others in HCLS that
> would like to know about this work and could contribute
> advice/opinions.]
>
> Thanks for your example RDF. It looks like a good start. The
> teleconference that Jun and Lena and I had with James was very useful,
> but several in the BioRDF task force couldn't attend (I organized it
> during my vacation but several others including Kei were travelling).
> I'm looking forward to helping each other find a satisfying approach
> to microarray data in RDF and hopefully arriving at a consensus that
> results in similar RDF being served directly from ArrayExpress and
> GEO.  It would then be possible to perform some basic bioinformatics
> work in SPARQL without having to create special ontologies and
> namespaces.
>
> Maybe this link will help you to understand what we are doing:
> http://esw.w3.org/HCLSIG_BioRDF_Subgroup/QueryFederation2
> Any questions you may have will help us to improve our wiki page. Some
> of our latest work is being 'staged' in DropBox at the moment but
> should be available from the wiki soon..
>
> Cheers,
> Scott
>
> --
> M. Scott Marshall, W3C HCLS IG co-chair
> Leiden University Medical Center / University of Amsterdam
> http://staff.science.uva.nl/~marshall
>
> ---------- Forwarded message ----------
> From: Christoph Grabmueller <grabmuel@ebi.ac.uk>
> Date: Mon, Jul 19, 2010 at 10:34 AM
> Subject: Re: RDF from Atlas
> To: James Malone <malone@ebi.ac.uk>
> Cc: Dietrich Rebholz-Schuhmann <rebholz@ebi.ac.uk>, "M. Scott
> Marshall" <mscottmarshall@gmail.com>, Jun Zhao
> <jun.zhao@zoo.ox.ac.uk>, Helena Deus <helenadeus@gmail.com>, Helen
> Parkinson <parkinson@ebi.ac.uk>, Misha Kapushesky <ostolop@ebi.ac.uk>
>
>
> One subtask of SESL is the integration of Gene Expression Atlas data
> into the RDF based so called information brokering system.
> I created a very simple representation that covers the needs of the
> project: differential gene expression under disease conditions; and by
> far doesn't cover all existing information.
>
> Here is one example in Notation3 (or rather Turtle). Please don't slam
> me too hard for the relations and name spaces, I'm pretty sure they
> are all wrong :) Any input as how to do it properly is welcome.
>
> @prefix ae: <http://www.ebi.ac.uk/gxa/> .
> @prefix aeExp: <http://www.ebi.ac.uk/gxa/experiment/> .
> @prefix efo: <http://www.ebi.ac.uk/efo/> .
> @prefix skos: <http://www.w3.org/2008/05/skos#> .
> @prefix uniprot: <http://purl.uniprot.org/uniprot/> .
>
> aeExp:E-GEOD-1869 ae:hasGeneExpression [ae:condition efo:EFO_0000319;
> skos:exactMatch uniprot:P30542; ae:expression "DOWN"; ae:pval
> 0.00503674] .
>
> Doing this for all 343 disease factors (under EFO_0000408), produces
> around 180k triples. So far, so good. Now I only have to link to UMLS,
> which is the basis for disease named entitiy recognition in our group.
> Should be simple since the disease bits of EFO are based on the
> Disesae Ontology, and DO references UMLS.
>
> But the efo.owl looks like this:
>   <efo:definition_citation
> rdf:datatype="http://www.w3.org/2001/XMLSchema#string">DOID:5485</ 
> efo:definition_citation>
> and to be able to enjoy any semanticness, I have to convert those
> citations to this (probably not necessary for future versions of EFO)
>   <efo:definition_citation
> rdf:resource="http://purl.org/obo/owl/DOID#DOID_5485"/>
>  The do.owl can be used directly to get to the UMLS CUIs, and after
> creating my own RDF version of UMLS which also includes references in
> the format DO decided to use for UMLS
> (http://purl.org/obo/owl/UMLS_CUI#UMLS_CUI_C0010674), I can query for
> differentially expressed genes via UMLS CUIs and disease strings.
> Federated query example using Jena's ARQ engine further below.
>
>
> In this whole process only the DO could be used as it was, but only
> after adopting my UMLS representation to fit its needs; and I had to
> create two RDF representations from scratch. The life sciences
> semantic web certainly has room for improvement...
>
> Christoph
>
>
> PREFIX dc:<http://purl.org/dc/elements/1.1/>
> PREFIX owl:<http://www.w3.org/2002/07/owl#>
> PREFIX oboInOwl:<http://www.geneontology.org/formats/oboInOwl#>
> PREFIX efo:<http://www.ebi.ac.uk/efo/>
> PREFIX umls:<http://umlsks.nlm.nih.gov/>
> PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
> PREFIX pdo:<http://purl.org/obo/owl/DOID#>
> PREFIX ae:<http://www.ebi.ac.uk/gxa/>
> PREFIX skos:<http://www.w3.org/2008/05/skos#>
>
> select distinct ?experiment ?uniprot ?updown ?pval where {
>  service <http://jweb-2b:21380/Rebholz-srv/openrdf-sesame/repositories/umls 
> > {
>   ?umls dc:name "Cystic Fibrosis" . #umls:C0010674
>   ?umls owl:sameAs ?umlsuri .
>  }  service <http://jweb-2b:21380/Rebholz-srv/openrdf-sesame/repositories/do 
> > {
>   ?do oboInOwl:hasDbXref ?refblank .
>   ?refblank oboInOwl:hasURI ?umlsuri .
>  }
>  service <http://jweb-2b:21380/Rebholz-srv/openrdf-sesame/repositories/efo 
> > {
>   ?efo efo:definition_citation ?do .
>  }
>  service <http://jweb-2b:21380/Rebholz-srv/openrdf-sesame/repositories/arrayexpress 
> >
> {
>   ?expression ae:condition ?efo .      ?expression skos:exactMatch ? 
> uniprot .
>   ?expression ae:expression ?updown .
>   ?expression ae:pval ?pval .      ?experiment ae:hasGeneExpression
> ?expression .    }
> }
>
> James Malone wrote:
>>
>> Hi Dietrich, Christoph,
>>
>> On an HCLS [1] call today we discussed an RDF representation of  
>> some of the Gene Expression Atlas. Scott, Jun and Lena were very  
>> interested to hear you had been working on producing some of this  
>> already in one of your other projects and since this represents the  
>> most RDF we have about the Atlas right now I thought I would put  
>> you guys in touch with one another. Their use cases can be found  
>> here if you are interested [2]. They are particularly interested in  
>> obtaining any rdf that you may have extracted using EFO.
>>
>> Many thanks,
>>
>> James
>>
>> [1] http://www.w3.org/blog/hcls
>> [2] http://esw.w3.org/HCLSIG_BioRDF_Subgroup/QueryFederation2
>>
>

Received on Wednesday, 11 August 2010 21:23:44 UTC