W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > January 2010

Re: magetab2magerdf

From: Jim McCusker <james.mccusker@yale.edu>
Date: Wed, 6 Jan 2010 11:54:57 -0500
Message-ID: <68084f3e1001060854h4d1a1f3dj88b66b05614c1187@mail.gmail.com>
To: Helena Deus <helenadeus@gmail.com>
Cc: w3c semweb HCLS <public-semweb-lifesci@w3.org>
On Wed, Jan 6, 2010 at 11:08 AM, Helena Deus <helenadeus@gmail.com> wrote:

> Hi Jim,
> This is great! I noticed you already add the links both to the raw data
> files and the processed data files, am I right in assuming this data comes
> from the SDRF?

Yes, these are comments embedded in SDRF, and the nodes for those files are
explicitly mentioned in SDRF too.

> I see you intergrated the MGED ontology with the data nicelly, have you
> attempted a few SPARQL queries, for example, retrieve all raw data files
> from "mged:arabidopsis_thaliana"?

I haven't yet tried any SPARQL queries like that, but that was the goal of
handling the Terms and Term Sources the way I did.

Also, I noticed that in your ontology you don't separate each sample
> hydridization raw file, probably because they are all distributed in the ftp
> as a compressed folder. For example, I see that inside raw data file archive
> "E-MEXP-986.raw.1.zip" there are 4 text files:
> 1d1S15.txt.txt, 2d1S15.txt.txt, 2d1S22.txt.txt and 4d1S22.txt.txt. Since
> it's possible to add a link from a Sample to each of these .txt files, do
> you think it would be useful to add this information in the raw rdf file?

Other SDRF files may link directly to a file (the ones that I've written
do), so in my mind it's a matter of GIGO. I don't currently go beyond what
is in the IDF and SDRF (in other words, what's being parsed by Limpopo), and
I'm trying to keep second-guessing to a minimum. One thing I hope this tool
exposes is the effects of certain kinds of curation on the available data
structures, and maybe some best practices can come out of it.

Jim McCusker
Programmer Analyst
Krauthammer Lab, Pathology Informatics
Yale School of Medicine
james.mccusker@yale.edu | (203) 785-6330

PhD Student
Tetherless World Constellation
Rensselaer Polytechnic Institute
Received on Wednesday, 6 January 2010 16:55:52 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:20:47 UTC