W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > December 2009

Re: BioRDF Telcon

From: mdmiller <mdmiller53@comcast.net>
Date: Fri, 4 Dec 2009 06:45:59 -0800
Message-ID: <6E4EB343DAF84312AC57CB3D34FEF35C@mmPC>
To: "Helena Deus" <helenadeus@gmail.com>, "Kei Cheung" <kei.cheung@yale.edu>
Cc: "HCLS" <public-semweb-lifesci@w3.org>
hi lena and kei,

you're pretty well assured that identifiers of the form E-GEOD-4757 are experiment accessions in ArrayExpress and will resolve to an IDF and SDRF [1].


[1]: http://www.ebi.ac.uk/microarray/doc/help/accession_codes.html
  ----- Original Message ----- 
  From: Helena Deus 
  To: Kei Cheung 
  Cc: mdmiller ; HCLS 
  Sent: Wednesday, December 02, 2009 3:27 PM
  Subject: Re: BioRDF Telcon

  Hi Kei,

  Furtunatelly arrayexpress provides both the IDF and SDRF for that acession number, at http://www.ebi.ac.uk/microarray-as/ae/browse.html?keywords=E-GEOD-4757

  I have a small RDF document of that IDF at http://magetab2rdf.googlecode.com/svn/trunk/E-GEOD-4757.idf.rdf


  On Tue, Dec 1, 2009 at 9:20 PM, Kei Cheung <kei.cheung@yale.edu> wrote:

    Hi Lena,

    Helena Deus wrote:


         When you said data structure, did you mean the RDF structure

      For now, all I have is the java object returned by parser. I've been using Limpopo, which creates an object that I can then parse to RDF uing Jena. The challenge, though, has been coming up with the predicates to formalize the relationships between the various elements. I'm using the XML structures fir IDF/SDRF etc. at http://magetab-om.sourceforge.net to automatically generate the structure that will contain the data. My plan is to then create the RDF triples that use the attributes described in those documents and populate them with the data from the MAGE-TAB java object created by Limpopo.

    Thanks for the pointer and explaining your strategy. We might not need to convert everything from mage-tab for our purposes.

      Right now all I have is a very raw RDF/XML document describing the relationships in the IDF structure: http://magetab2rdf.googlecode.com/svn/trunk/magetabpredicates.rdf
      The triples for that had to be encoded manually using Jena by reading the model.

    I think IDF is a good start. For a real example for our use case, I wonder if any mage-tab file is available for experiment E-GEOD-4757 (transcription profiling of human neurons with and without neurofibriallary tangles from Alzheimer's patients). Helen may know.



      @Satya and Jun

      I would very much like to be involved in that effort, do you already have a URL that I can look at?


      On Tue, Nov 24, 2009 at 2:19 PM, Kei Cheung <kei.cheung@yale.edu <mailto:kei.cheung@yale.edu>> wrote:

         Hi Lena et al,

         When you said data structure, did you mean the RDF structure. If
         so, is a pointer to the structure that we can look at?

         As discussed during yesterday's call, Jun and Satya will help
         create a wiki page for listing some of the requirements for
         provenance/workflow in the context of gene lists, perhaps we
         should also use it to help coordinate some of the future
         activities (people also brought up Taverna during the call
         yesterday). Please coordinate with Satya and Jun.



         Helena Deus wrote:

             Hi all,

             I apologize for missing the call yesterday! It seems you had a
             pretty interesting discussion! :-)
             If I understand Michael's statement, parsing the
             MAGE-TAB/MAGE-ML into RDF would result in obtaining only the
             raw and processed data files but not the mechanism used to
             process it nor the resulting gene list. That's also what I
             concluded after looking at the data structure created by Tony
             Burdett's Limpopo parser. However, having the raw data as
             linked data is already a great start! Kei, should I be looking
             into Taverna in order to reprocessed the raw files with a
             traceable analysis workflow?


             On Tue, Nov 24, 2009 at 9:59 AM, mdmiller
             <mdmiller53@comcast.net <mailto:mdmiller53@comcast.net>


             <mailto:mdmiller53@comcast.net>>> wrote:

                hi all,

                (from the minutes)

                "Yolanda/Kei/Scott: semantic annotation/description of workflow
                would enable the retrieval of data relevant to that
             workflow (i.e.
                data that could be used to populate that workflow for a
                experimental scenario)"

                what is typically in a MAGE-TAB/MAGE-ML document are the
                for how the source was processed into the extract then how the
                hybridization, feature extraction, error and normalization were
                performed.  these are interesting and different protocols can
                cause differences at this level but it is pretty much a
             known art
                and usually not of too much interest or variability.

                what is usually missing from those documents, along with
             the final
                gene list, is how that gene list was obtained, what higher
                analysis was used, that is generally only in the paper

                ----- Original Message ----- From: "Kei Cheung"
                <kei.cheung@yale.edu <mailto:kei.cheung@yale.edu>

             <mailto:kei.cheung@yale.edu <mailto:kei.cheung@yale.edu>>>

                To: "HCLS" <public-semweb-lifesci@w3.org

                Sent: Monday, November 23, 2009 1:27 PM
                Subject: Re: BioRDF Telcon

                    Today's BioRDF minutes are available at the following:


                    Thanks to Rob for scribing.



                    Kei Cheung wrote:

                        This is a reminder that the next BioRDF telcon call
                        be held at 11 am EDT (5 pm CET) on Monday, November 23
                        (see details below).



                        == Conference Details ==
                        * Date of Call: Monday November 23, 2009
                        * Time of Call: 11:00 am Eastern Time
                        * Dial-In #: +1.617.761.6200 (Cambridge, MA)
                        * Dial-In #: + (Nice, France)
                        * Dial-In #: +44.117.370.6152 (Bristol, UK)
                        * Participant Access Code: 4257 ("HCLS")
                        * IRC Channel: irc.w3.org <http://irc.w3.org>
             <http://irc.w3.org> port 6665

                        channel #HCLS (see W3C IRC page for details, or see Web
                        IRC), Quick Start: Use
                        for IRC access.
                        * Duration: ~1 hour
                        * Frequency: bi-weekly
                        * Convener: Kei Cheung
                        * Scribe: to-be-determined

                        == Agenda ==
                        * Roll call & introduction (Kei)
                        * RDF representation of microarray experiment and
             data (All)
                        * Provenance and workflow (All)
Received on Friday, 4 December 2009 14:47:44 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:20:46 UTC