RE: magetab2magerdf from Michael Miller on 2010-01-06 (public-semweb-lifesci@w3.org from January 2010)

From: Michael Miller <mmiller@teranode.com>
Date: Wed, 6 Jan 2010 11:55:31 -0500
To: "Helena Deus" <helenadeus@gmail.com>, "Jim McCusker" <james.mccusker@yale.edu>
Cc: "w3c semweb HCLS" <public-semweb-lifesci@w3.org>
Message-ID: <6401DB16544A5B4AA279B921F43547EC066FCB2B@MI8NYCMAIL16.Mi8.com>
hi helena and jim,

 

this is what i see in E-MEXP-986.rdf which seems a nice way to capture
this information:

...

                  <j.0:Scan
rdf:about=".#scanname/ebi.ac.uk:MIAMExpress:Hybridization:24902">

                    <j.0:has_derivative>

                      <j.0:ArrayDataMatrix
rdf:about=".#arraydatamatrixfile/E-MEXP-986-raw-data-1321832734.txt">

                        <j.0:has_comment>

                          <j.0:Comment
rdf:about=".#arraydatamatrixfile/E-MEXP-986-raw-data-1321832734.txt/comm
ents/1">

                            <j.1:has_value
rdf:datatype="http://www.w3.org/2001/XMLSchema#string"

 
>ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/experiment/MEXP/E-MEX
P-986/E-MEXP-986.raw.zip</j.1:has_value>

                            <j.1:has_name
rdf:datatype="http://www.w3.org/2001/XMLSchema#string"

                            >ArrayExpress FTP file</j.1:has_name>

...

 

it has the name of the file then a reference to the name of the
ArrayExpress FTP file.

 

but the problem seems to be that ArrayDataMatrix is referencing the
'Derived Array Data Matrix File' column (which is different than a
'Array Data Matrix File' column) when it should be referencing the
'Array Data File' column, there is a nested DerivedArrayDataMatrix
element which does look correct.  above looks like
E-MEXP-986-raw-data-1321832734 should be an ArrayDataFile element with a
value of '2d1S15.txt.txt', not sure how the name of the file was gotten,
there is no mention in the SDRF of a file of that name but it is similar
in name to the file in the Derived Array Data Matrix file.   There is no
mention of '2d1S15.txt.txt', the correct name, anywhere in the file.

 

plus there also seems to be unnecessary duplication, i.e. there's a
nested repeat of ArrayDataMatrix and DerivedArrayDataMatrix elements?
but this might be an artifice of XML RDF?

 

does XML RDF allow referencing an element that is fully defined
elsewhere?  that would make things a lot clearer and concise.

 

but this is a great start.

 

cheers,

michael

 

 

 

From: public-semweb-lifesci-request@w3.org
[mailto:public-semweb-lifesci-request@w3.org] On Behalf Of Helena Deus
Sent: Wednesday, January 06, 2010 8:09 AM
To: Jim McCusker
Cc: w3c semweb HCLS
Subject: Re: magetab2magerdf

 

Hi Jim,

 

This is great! I noticed you already add the links both to the raw data
files and the processed data files, am I right in assuming this data
comes from the SDRF? 

I see you intergrated the MGED ontology with the data nicelly, have you
attempted a few SPARQL queries, for example, retrieve all raw data files
from "mged:arabidopsis_thaliana"?

 

Also, I noticed that in your ontology you don't separate each sample
hydridization raw file, probably because they are all distributed in the
ftp as a compressed folder. For example, I see that inside raw data file
archive "E-MEXP-986.raw.1.zip" there are 4 text files:

1d1S15.txt.txt, 2d1S15.txt.txt, 2d1S22.txt.txt and 4d1S22.txt.txt. Since
it's possible to add a link from a Sample to each of these .txt files,
do you think it would be useful to add this information in the raw rdf
file?

 

Thanks!
Lena

On Tue, Dec 8, 2009 at 8:05 AM, Jim McCusker <james.mccusker@yale.edu>
wrote:

I'm distinguishing between magetab2rdf (raw conversion of magetab into
an RDF structure) and magetab2magerdf (conversion of magetab into an
RDF-based MAGE-OM structure) here. My purposes and goals require a
magetab2magerdf approach, so that's what I've been working on.

I have checked in code for magetab2magerdf at the googlecode project
http://magetab2rdf.googlecode.com. The code can be checked out from:

http://magetab2rdf.googlecode.com/svn/trunk/magetab2magerdf/

and example RDF is in:

http://magetab2rdf.googlecode.com/svn/trunk/magetab2magerdf/examples/E-M
EXP-986/

I currently load the IDF-related entities into the RDF. I'm beginning
work on SDRF next.

http://magetab2rdf.googlecode.com/svn/trunk/ontologies/mage-om.owl
contains the additional properties and classes needed to support an
RDF-based MAGE-OM on top of the MGED Ontology.

A few notes on E-MEXP-986:

The URI for the MGED Ontology is
http://mged.sourceforge.net/ontologies/MGEDontology.owl, but has been
set to http://mged.sourceforge.net/ontologies/MGEDontology.php in the
IDF. The actual Term Source name is "The MGED Ontology".
A common practice seems to be to refer to "MGED Ontology" without
reference to its URI.

Since I have to import the MGED ontology already for it's classes and
properties, I have already imported it under the correct URI. I have
added a kludge where if the term source name contains the string "MGED
Ontology", the code assumes you mean the MGED Ontology, and sets the
URI appropriately. However, this is a one-off solution.

I went back and forth about importing the Term Source ontologies.
However, this particular experiment has used the "ArrayExpress" term
source using the URI "http://www.ebi.ac.uk/arrayexpress/" which
doesn't correspond to an available ontology, but is technically a term
source.

I'm considering attempting to import the ontology if it's available
and validate if it is, but if it fails to resolve to a document the
validation will not happen against that term source.

A note on Limpopo:

The IDF Comment didn't seem to import on this experiment. I'm not sure
if it's a format problem or something else.

Thoughts and feedback are greatly appreciated.

Jim
--
Jim McCusker
Programmer Analyst
Krauthammer Lab, Pathology Informatics
Yale School of Medicine
james.mccusker@yale.edu | (203) 785-6330
http://krauthammerlab.med.yale.edu

PhD Student
Tetherless World Constellation
Rensselaer Polytechnic Institute
mccusj@cs.rpi.edu
http://tw.rpi.edu
Received on Wednesday, 6 January 2010 16:56:04 UTC