Re: BioRDF Telcon from Kei Cheung on 2009-12-07 (public-semweb-lifesci@w3.org from December 2009)

From: Kei Cheung <kei.cheung@yale.edu>
Date: Mon, 07 Dec 2009 10:12:06 -0500
To: Helena Deus <helenadeus@gmail.com>
Cc: mdmiller <mdmiller53@comcast.net>, HCLS <public-semweb-lifesci@w3.org>, Helen Parkinson <parkinson@ebi.ac.uk>
Message-id: <4B1D1B46.7020202@yale.edu>
Hi Lena,

Thanks for finding the IDF and SDRF files corresponding to the 
experiment (E-GEOD-4757). It looks like the SDRF file contains richer 
metadata that can support richer semantic queries across experiments 
(e.g., finding experiments that involve the same cell types for the 
same/related brain regions for the same species).

I noticed in the SDRF file that there are 20 samples (10 normal and 10 
AD with neurofibriallary tangle). According to the abstract of the paper 
(http://www.ncbi.nlm.nih.gov/pubmed/16242812?dopt=Abstract), it says the 
following:

" ... we compared gene expression profiles of NFT-bearing entorhinal 
cortex neurons from 19 AD patients, adjacent non-NFT-bearing entorhinal 
cortex neurons from the same patients, and non-NFT-bearing entorhinal 
cortex neurons from 14 non-demented, histopathologically normal controls 
(ND). "

If I understand it correctly, there should be a total of 33 samples (19 
AD and 14 normal). This may be more of a curation question for the 
ArrayExpress team. Maybe I missed something.

Cheers,

-Kei

Helena Deus wrote:
> Hi Kei,
>
> Furtunatelly arrayexpress provides both the IDF and SDRF for that 
> acession number, 
> at http://www.ebi.ac.uk/microarray-as/ae/browse.html?keywords=E-GEOD-4757
>
> I have a small RDF document of that IDF 
> at http://magetab2rdf.googlecode.com/svn/trunk/E-GEOD-4757.idf.rdf
>
> Thanks
> Lena
>
>  
>
> On Tue, Dec 1, 2009 at 9:20 PM, Kei Cheung <kei.cheung@yale.edu 
> <mailto:kei.cheung@yale.edu>> wrote:
>
>     Hi Lena,
>
>
>     Helena Deus wrote:
>
>         @Kei,
>
>            When you said data structure, did you mean the RDF structure
>
>
>         For now, all I have is the java object returned by parser.
>         I've been using Limpopo, which creates an object that I can
>         then parse to RDF uing Jena. The challenge, though, has been
>         coming up with the predicates to formalize the relationships
>         between the various elements. I'm using the XML structures fir
>         IDF/SDRF etc. at http://magetab-om.sourceforge.net to
>         automatically generate the structure that will contain the
>         data. My plan is to then create the RDF triples that use the
>         attributes described in those documents and populate them with
>         the data from the MAGE-TAB java object created by Limpopo.
>
>
>
>     Thanks for the pointer and explaining your strategy. We might not
>     need to convert everything from mage-tab for our purposes.
>
>
>
>         Right now all I have is a very raw RDF/XML document describing
>         the relationships in the IDF structure:
>         http://magetab2rdf.googlecode.com/svn/trunk/magetabpredicates.rdf
>         The triples for that had to be encoded manually using Jena by
>         reading the model.
>
>
>     I think IDF is a good start. For a real example for our use case,
>     I wonder if any mage-tab file is available for experiment
>     E-GEOD-4757 (transcription profiling of human neurons with and
>     without neurofibriallary tangles from Alzheimer's patients). Helen
>     may know.
>
>     Cheers,
>
>     -Kei
>
>
>         @Satya and Jun
>
>         I would very much like to be involved in that effort, do you
>         already have a URL that I can look at?
>
>         Thanks
>         Lena
>
>         On Tue, Nov 24, 2009 at 2:19 PM, Kei Cheung
>         <kei.cheung@yale.edu <mailto:kei.cheung@yale.edu>
>         <mailto:kei.cheung@yale.edu <mailto:kei.cheung@yale.edu>>> wrote:
>
>            Hi Lena et al,
>
>            When you said data structure, did you mean the RDF
>         structure. If
>            so, is a pointer to the structure that we can look at?
>
>            As discussed during yesterday's call, Jun and Satya will help
>            create a wiki page for listing some of the requirements for
>            provenance/workflow in the context of gene lists, perhaps we
>            should also use it to help coordinate some of the future
>            activities (people also brought up Taverna during the call
>            yesterday). Please coordinate with Satya and Jun.
>
>            Cheers,
>
>            -Kei
>
>            Helena Deus wrote:
>
>                Hi all,
>
>                I apologize for missing the call yesterday! It seems
>         you had a
>                pretty interesting discussion! :-)
>                If I understand Michael's statement, parsing the
>                MAGE-TAB/MAGE-ML into RDF would result in obtaining
>         only the
>                raw and processed data files but not the mechanism used to
>                process it nor the resulting gene list. That's also what I
>                concluded after looking at the data structure created
>         by Tony
>                Burdett's Limpopo parser. However, having the raw data as
>                linked data is already a great start! Kei, should I be
>         looking
>                into Taverna in order to reprocessed the raw files with a
>                traceable analysis workflow?
>
>                Thanks!
>                Lena
>
>
>
>                On Tue, Nov 24, 2009 at 9:59 AM, mdmiller
>                <mdmiller53@comcast.net <mailto:mdmiller53@comcast.net>
>         <mailto:mdmiller53@comcast.net <mailto:mdmiller53@comcast.net>>
>                <mailto:mdmiller53@comcast.net
>         <mailto:mdmiller53@comcast.net>
>
>                <mailto:mdmiller53@comcast.net
>         <mailto:mdmiller53@comcast.net>>>> wrote:
>
>                   hi all,
>
>                   (from the minutes)
>
>                   "Yolanda/Kei/Scott: semantic annotation/description
>         of workflow
>                   would enable the retrieval of data relevant to that
>                workflow (i.e.
>                   data that could be used to populate that workflow for a
>                different
>                   experimental scenario)"
>
>                   what is typically in a MAGE-TAB/MAGE-ML document are the
>                protocols
>                   for how the source was processed into the extract
>         then how the
>                   hybridization, feature extraction, error and
>         normalization were
>                   performed.  these are interesting and different
>         protocols can
>                   cause differences at this level but it is pretty much a
>                known art
>                   and usually not of too much interest or variability.
>
>                   what is usually missing from those documents, along with
>                the final
>                   gene list, is how that gene list was obtained, what
>         higher
>                level
>                   analysis was used, that is generally only in the paper
>                unfortunately.
>
>                   cheers,
>                   michael
>                   .
>                   ----- Original Message ----- From: "Kei Cheung"
>                   <kei.cheung@yale.edu <mailto:kei.cheung@yale.edu>
>         <mailto:kei.cheung@yale.edu <mailto:kei.cheung@yale.edu>>
>                <mailto:kei.cheung@yale.edu
>         <mailto:kei.cheung@yale.edu> <mailto:kei.cheung@yale.edu
>         <mailto:kei.cheung@yale.edu>>>>
>
>
>                   To: "HCLS" <public-semweb-lifesci@w3.org
>         <mailto:public-semweb-lifesci@w3.org>
>                <mailto:public-semweb-lifesci@w3.org
>         <mailto:public-semweb-lifesci@w3.org>>
>                   <mailto:public-semweb-lifesci@w3.org
>         <mailto:public-semweb-lifesci@w3.org>
>                <mailto:public-semweb-lifesci@w3.org
>         <mailto:public-semweb-lifesci@w3.org>>>>
>
>                   Sent: Monday, November 23, 2009 1:27 PM
>                   Subject: Re: BioRDF Telcon
>
>
>
>                       Today's BioRDF minutes are available at the
>         following:
>
>                            
>         http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Meetings/2009/11-23_Conference_Call
>
>                       Thanks to Rob for scribing.
>
>                       Cheers,
>
>                       -Kei
>
>                       Kei Cheung wrote:
>
>                           This is a reminder that the next BioRDF
>         telcon call
>                will
>                           be held at 11 am EDT (5 pm CET) on Monday,
>         November 23
>                           (see details below).
>
>                           Cheers,
>
>                           -Kei
>
>                           == Conference Details ==
>                           * Date of Call: Monday November 23, 2009
>                           * Time of Call: 11:00 am Eastern Time
>                           * Dial-In #: +1.617.761.6200 (Cambridge, MA)
>                           * Dial-In #: +33.4.89.06.34.99 (Nice, France)
>                           * Dial-In #: +44.117.370.6152 (Bristol, UK)
>                           * Participant Access Code: 4257 ("HCLS")
>                           * IRC Channel: irc.w3.org
>         <http://irc.w3.org> <http://irc.w3.org>
>                <http://irc.w3.org> port 6665
>
>                           channel #HCLS (see W3C IRC page for details,
>         or see Web
>                           IRC), Quick Start: Use
>                                
>         http://www.mibbit.com/chat/?server=irc.w3.org:6665&channel=%23hcls
>         <http://www.mibbit.com/chat/?server=irc.w3.org:6665&channel=%23hcls>
>              
>          <http://www.mibbit.com/chat/?server=irc.w3.org:6665&channel=%23hcls
>         <http://www.mibbit.com/chat/?server=irc.w3.org:6665&channel=%23hcls>>
>                                
>         <http://www.mibbit.com/chat/?server=irc.w3.org:6665&channel=%23hcls
>         <http://www.mibbit.com/chat/?server=irc.w3.org:6665&channel=%23hcls>
>              
>          <http://www.mibbit.com/chat/?server=irc.w3.org:6665&channel=%23hcls
>         <http://www.mibbit.com/chat/?server=irc.w3.org:6665&channel=%23hcls>>>
>                           for IRC access.
>                           * Duration: ~1 hour
>                           * Frequency: bi-weekly
>                           * Convener: Kei Cheung
>                           * Scribe: to-be-determined
>
>                           == Agenda ==
>                           * Roll call & introduction (Kei)
>                           * RDF representation of microarray
>         experiment and
>                data (All)
>                           * Provenance and workflow (All)
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
Received on Monday, 7 December 2009 15:20:54 UTC