Fwd: RDF from Gene Atlas

FYI

---------- Forwarded message ----------
From: M. Scott Marshall <mscottmarshall@gmail.com>
Date: Mon, Aug 9, 2010 at 10:26 AM
Subject: Fwd: RDF from Atlas
To: Christoph Grabmueller <grabmuel@ebi.ac.uk>, Kei Cheung
<kei.cheung@yale.edu>, Satya Sahoo <satyasahoo@gmail.com>, Matthias
Samwald <samwald@gmx.at>, crockey@io-informatics.com
Cc: Jun Zhao <jun.zhao@zoo.ox.ac.uk>, Helena Deus
<helenadeus@gmail.com>, Rebholz <rebholz@ebi.ac.uk>


Note to BioRDF members: Christoph Grabmeuller provided us with example
microarray RDF from Rebholz's text mining group at EBI using EFO (see
below). Notice that Christoph would like feedback and guidance. It
could be informative to compare our approaches.

Christoph,

[Would you mind if I CC the HCLS mailing list "HCLS"
<public-semweb-lifesci@w3.org> ? There are many others in HCLS that
would like to know about this work and could contribute
advice/opinions.]

Thanks for your example RDF. It looks like a good start. The
teleconference that Jun and Lena and I had with James was very useful,
but several in the BioRDF task force couldn't attend (I organized it
during my vacation but several others including Kei were travelling).
I'm looking forward to helping each other find a satisfying approach
to microarray data in RDF and hopefully arriving at a consensus that
results in similar RDF being served directly from ArrayExpress and
GEO.  It would then be possible to perform some basic bioinformatics
work in SPARQL without having to create special ontologies and
namespaces.

Maybe this link will help you to understand what we are doing:
http://esw.w3.org/HCLSIG_BioRDF_Subgroup/QueryFederation2
Any questions you may have will help us to improve our wiki page. Some
of our latest work is being 'staged' in DropBox at the moment but
should be available from the wiki soon..

Cheers,
Scott

--
M. Scott Marshall, W3C HCLS IG co-chair
Leiden University Medical Center / University of Amsterdam
http://staff.science.uva.nl/~marshall

---------- Forwarded message ----------
From: Christoph Grabmueller <grabmuel@ebi.ac.uk>
Date: Mon, Jul 19, 2010 at 10:34 AM
Subject: Re: RDF from Atlas
To: James Malone <malone@ebi.ac.uk>
Cc: Dietrich Rebholz-Schuhmann <rebholz@ebi.ac.uk>, "M. Scott
Marshall" <mscottmarshall@gmail.com>, Jun Zhao
<jun.zhao@zoo.ox.ac.uk>, Helena Deus <helenadeus@gmail.com>, Helen
Parkinson <parkinson@ebi.ac.uk>, Misha Kapushesky <ostolop@ebi.ac.uk>


One subtask of SESL is the integration of Gene Expression Atlas data
into the RDF based so called information brokering system.
I created a very simple representation that covers the needs of the
project: differential gene expression under disease conditions; and by
far doesn't cover all existing information.

Here is one example in Notation3 (or rather Turtle). Please don't slam
me too hard for the relations and name spaces, I'm pretty sure they
are all wrong :) Any input as how to do it properly is welcome.

@prefix ae: <http://www.ebi.ac.uk/gxa/> .
@prefix aeExp: <http://www.ebi.ac.uk/gxa/experiment/> .
@prefix efo: <http://www.ebi.ac.uk/efo/> .
@prefix skos: <http://www.w3.org/2008/05/skos#> .
@prefix uniprot: <http://purl.uniprot.org/uniprot/> .

aeExp:E-GEOD-1869 ae:hasGeneExpression [ae:condition efo:EFO_0000319;
skos:exactMatch uniprot:P30542; ae:expression "DOWN"; ae:pval
0.00503674] .

Doing this for all 343 disease factors (under EFO_0000408), produces
around 180k triples. So far, so good. Now I only have to link to UMLS,
which is the basis for disease named entitiy recognition in our group.
Should be simple since the disease bits of EFO are based on the
Disesae Ontology, and DO references UMLS.

But the efo.owl looks like this:
  <efo:definition_citation
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">DOID:5485</efo:definition_citation>
and to be able to enjoy any semanticness, I have to convert those
citations to this (probably not necessary for future versions of EFO)
  <efo:definition_citation
rdf:resource="http://purl.org/obo/owl/DOID#DOID_5485"/>
 The do.owl can be used directly to get to the UMLS CUIs, and after
creating my own RDF version of UMLS which also includes references in
the format DO decided to use for UMLS
(http://purl.org/obo/owl/UMLS_CUI#UMLS_CUI_C0010674), I can query for
differentially expressed genes via UMLS CUIs and disease strings.
Federated query example using Jena's ARQ engine further below.


In this whole process only the DO could be used as it was, but only
after adopting my UMLS representation to fit its needs; and I had to
create two RDF representations from scratch. The life sciences
semantic web certainly has room for improvement...

Christoph


PREFIX dc:<http://purl.org/dc/elements/1.1/>
PREFIX owl:<http://www.w3.org/2002/07/owl#>
PREFIX oboInOwl:<http://www.geneontology.org/formats/oboInOwl#>
PREFIX efo:<http://www.ebi.ac.uk/efo/>
PREFIX umls:<http://umlsks.nlm.nih.gov/>
PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
PREFIX pdo:<http://purl.org/obo/owl/DOID#>
PREFIX ae:<http://www.ebi.ac.uk/gxa/>
PREFIX skos:<http://www.w3.org/2008/05/skos#>

select distinct ?experiment ?uniprot ?updown ?pval where {
 service <http://jweb-2b:21380/Rebholz-srv/openrdf-sesame/repositories/umls> {
  ?umls dc:name "Cystic Fibrosis" . #umls:C0010674
  ?umls owl:sameAs ?umlsuri .
 }  service <http://jweb-2b:21380/Rebholz-srv/openrdf-sesame/repositories/do> {
  ?do oboInOwl:hasDbXref ?refblank .
  ?refblank oboInOwl:hasURI ?umlsuri .
 }
 service <http://jweb-2b:21380/Rebholz-srv/openrdf-sesame/repositories/efo> {
  ?efo efo:definition_citation ?do .
 }
 service <http://jweb-2b:21380/Rebholz-srv/openrdf-sesame/repositories/arrayexpress>
{
  ?expression ae:condition ?efo .      ?expression skos:exactMatch ?uniprot .
  ?expression ae:expression ?updown .
  ?expression ae:pval ?pval .      ?experiment ae:hasGeneExpression
?expression .    }
}

James Malone wrote:
>
> Hi Dietrich, Christoph,
>
> On an HCLS [1] call today we discussed an RDF representation of some of the Gene Expression Atlas. Scott, Jun and Lena were very interested to hear you had been working on producing some of this already in one of your other projects and since this represents the most RDF we have about the Atlas right now I thought I would put you guys in touch with one another. Their use cases can be found here if you are interested [2]. They are particularly interested in obtaining any rdf that you may have extracted using EFO.
>
> Many thanks,
>
> James
>
> [1] http://www.w3.org/blog/hcls
> [2] http://esw.w3.org/HCLSIG_BioRDF_Subgroup/QueryFederation2
>

Received on Wednesday, 11 August 2010 19:33:19 UTC