Re: magetab2magerdf from Jim McCusker on 2010-01-06 (public-semweb-lifesci@w3.org from January 2010)

From: Jim McCusker <james.mccusker@yale.edu>
Date: Wed, 6 Jan 2010 11:54:57 -0500
To: Helena Deus <helenadeus@gmail.com>
Cc: w3c semweb HCLS <public-semweb-lifesci@w3.org>
Message-ID: <68084f3e1001060854h4d1a1f3dj88b66b05614c1187@mail.gmail.com>

On Wed, Jan 6, 2010 at 11:08 AM, Helena Deus <helenadeus@gmail.com> wrote:

> Hi Jim,
>
> This is great! I noticed you already add the links both to the raw data
> files and the processed data files, am I right in assuming this data comes
> from the SDRF?
>

Yes, these are comments embedded in SDRF, and the nodes for those files are
explicitly mentioned in SDRF too.

> I see you intergrated the MGED ontology with the data nicelly, have you
> attempted a few SPARQL queries, for example, retrieve all raw data files
> from "mged:arabidopsis_thaliana"?
>

I haven't yet tried any SPARQL queries like that, but that was the goal of
handling the Terms and Term Sources the way I did.

Also, I noticed that in your ontology you don't separate each sample
> hydridization raw file, probably because they are all distributed in the ftp
> as a compressed folder. For example, I see that inside raw data file archive
> "E-MEXP-986.raw.1.zip" there are 4 text files:
> 1d1S15.txt.txt, 2d1S15.txt.txt, 2d1S22.txt.txt and 4d1S22.txt.txt. Since
> it's possible to add a link from a Sample to each of these .txt files, do
> you think it would be useful to add this information in the raw rdf file?
>

Other SDRF files may link directly to a file (the ones that I've written
do), so in my mind it's a matter of GIGO. I don't currently go beyond what
is in the IDF and SDRF (in other words, what's being parsed by Limpopo), and
I'm trying to keep second-guessing to a minimum. One thing I hope this tool
exposes is the effects of certain kinds of curation on the available data
structures, and maybe some best practices can come out of it.

Jim
--
Jim McCusker
Programmer Analyst
Krauthammer Lab, Pathology Informatics
Yale School of Medicine
james.mccusker@yale.edu | (203) 785-6330
http://krauthammerlab.med.yale.edu

PhD Student
Tetherless World Constellation
Rensselaer Polytechnic Institute
mccusj@cs.rpi.edu
http://tw.rpi.edu

Received on Wednesday, 6 January 2010 16:55:52 UTC