W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > July 2009

RE: BioRDF Telcon

From: Miller, Michael D (Rosetta) <Michael_Miller@Rosettabio.com>
Date: Fri, 24 Jul 2009 15:28:19 -0700
Message-ID: <C9EDB84D403E654CB78E37A506E406AF02457D63@ussemx1101.merck.com>
To: "Kei Cheung" <kei.cheung@yale.edu>
Cc: "Helen Parkinson" <parkinson@ebi.ac.uk>, "HCLS" <public-semweb-lifesci@w3.org>, "James Malone" <malone@ebi.ac.uk>
hi kei,

good luck on your endeavors.

> I 
> can relay your comments about the validity of mageml to the 
> consortium, 
> although I don't know whether they can address them.

understandable if they can't address the changes but if they have a
chance, i've attached what i've done to make sabba-affy-rat-168529 valid
MAGE-ML.

cheers,
michael

Michael Miller
Lead Software Developer
Rosetta Biosoftware Business Unit
www.rosettabio.com


> -----Original Message-----
> From: Kei Cheung [mailto:kei.cheung@yale.edu] 
> Sent: Friday, July 24, 2009 11:34 AM
> To: Miller, Michael D (Rosetta)
> Cc: Helen Parkinson; HCLS; James Malone
> Subject: Re: BioRDF Telcon
> 
> Hi Michael,
> 
> Thanks for your detailed description of mageml. For our use case, we 
> probably don't need to use all the information captured in 
> mageml. The 
> types of information we are currently focusing on include 
> experiment/sample annotation (including some provenance as you 
> indicated) and gene lists and how they are linked to existing 
> ontologies. A couple of convincing examples may be enough to start. I 
> can relay your comments about the validity of mageml to the 
> consortium, 
> 
> Cheers,
> 
> -Kei
> 
> Miller, Michael D (Rosetta) wrote:
> 
> >hi kei and helen,
> >
> >like helen, i've been following the HCLS working groups with great
> >interest.  as one of the designers, with helen, of the MAGE-ML and
> >MAGE-TAB specs i might be able to provide a little technical insight
> >into the formats.
> >
> >(from helen)
> >"This is probably as we don't have data - here's a list of human 
> >experiments with the term neuron - if any of these are 
> useful, then I 
> >can prioritize their curation and inclusion in an atlas release"
> >
> >kei, are the NIH Neuroscience Microarry Consortium exeriments you've
> >cited and others like them in GEO or ArrayExpress?  a set of 
> those could
> >be a good starting point for helen.
> >  
> >
> 
> My understanding is that the publicly visible mciroarray 
> projects in the 
> neuroscience microarray consortium should also be in geo and/or 
> arrayexpress, although I don't know whether all the annotations are 
> preserved.
> 
> 
> >first, MAGE-ML is based on a DTD[1], not an XSD.  in early 
> 2002 as the
> >OMG Gene Expression specification[1] was being finalized, 
> XSD was still
> >in its infancy so we weren't comfortable at that point 
> generating a XSD.
> >the MAGE-OM UML[2], in a very early XMI format from Rational Rose and
> >UniSys, was used to generate the DTD with code we wrote 
> ourselves[3]. 
> >
> >the UML model was designed to capture the flow of a microarray
> >experiment and how the resulting arrays were organized in 
> the experiment
> >based on how the samples were treated and/or on the samples' 
> phenotypes
> >for the purpose of a reviewer understanding the methodology and for a
> >researcher replicating and/or re-analyzing the results.  
> >
> >some of the details of the flow may not be of much interest, i.e. it
> >might be worth simply connecting the BioSource elements with 
> their gene
> >expression data and not worrying about how the hybridization was
> >performed.  but that depends on what you want to do and you know that
> >better than i.
> >
> >also, the data itself are specified in external files, typically in a
> >white-space delimited format where the column headers are 
> specified in
> >the MAGE-ML file in the QuantitationTypeDimension element and the
> >identifiers of the row specified in one of the three
> >DesignElementDimension elements, Feature, Reporter, 
> CompositeSequence,
> >depending on how derived the data is.  Also the data can be 
> in a vendor
> >specific format such as the Affymetrix CEL (since the CEL file
> >internally specifies the dimensions often they are left out of the
> >MAGE-ML document).
> >
> >the ExperimentalFactor elements are certainly relevant and if you've
> >looked at some of the examples you will noticed that the BioSource
> >elements, in particular, and other elements are annotated by
> >OntologyEntry elements.  from the gene expression specification:
> >
> >"OntologyEntry
> >A single entry from an ontology or a controlled vocabulary. For
> >instance, category
> >could be 'species name,' value could be 'homo sapiens' and ontology
> >would be
> >taxonomy database, NCBI."
> >
> >for the element an ontology entry element is annotating, we 
> looked at it
> >as a way of specifying something like "the object identified by the
> >element is an instance of the class/individual specified by the
> >OntologyEntry"
> >
> >so from "kitm-affy-droso-176167" one sees that the BioSource is an
> >"instance of" Drosophila, whole animal, whole head and an 
> age of 3 days:
> >
> >         <BioSource
> >identifier="arrayconsortium.tgen.org::biosource.181527" 
> name="Oregon R
> >head 3d">
> >            <Characteristics_assnlist>
> >               <OntologyEntry category="Organism" value="Drosophila"
> >description="Drosophila">
> >                  <OntologyReference_assn>
> >                     <DatabaseEntry accession="#Organism"
> >URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#
> Organism">
> >                        <Database_assnref>
> >                           <Database_ref identifier="MO"/>
> >                        </Database_assnref>
> >                     </DatabaseEntry>
> ><!-- snip -->
> >                  </OntologyReference_assn>
> >               </OntologyEntry>
> >               <OntologyEntry category="OrganismPart" value="whole
> >animal" description="">
> >                  <OntologyReference_assn>
> >                     <DatabaseEntry accession="#OrganismPart"
> >URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#
> OrganismPar
> >t">
> >                        <Database_assnref>
> >                           <Database_ref identifier="MO"/>
> >                        </Database_assnref>
> >                     </DatabaseEntry>
> >                  </OntologyReference_assn>
> ><!-- snip -->
> >               </OntologyEntry>
> >               <OntologyEntry category="OrganismPartRegion" 
> value="whole
> >head" description="">
> ><!-- snip -->
> >               </OntologyEntry>
> ><!-- snip -->
> >               <OntologyEntry category="Age" value="Age">
> >                  <OntologyReference_assn>
> >                     <DatabaseEntry accession="#Age"
> >URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#Age">
> >                        <Database_assnref>
> >                           <Database_ref identifier="MO"/>
> >                        </Database_assnref>
> >                     </DatabaseEntry>
> >                  </OntologyReference_assn>
> >                  <Associations_assnlist>
> >                     <OntologyEntry category="has_measurement"
> >value="has_measurement">
> >                        <OntologyReference_assn>
> >                           <DatabaseEntry 
> accession="#has_measurement"
> >URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#
> has_measure
> >ment">
> >                              <Database_assnref>
> >                                 <Database_ref identifier="MO"/>
> >                              </Database_assnref>
> >                           </DatabaseEntry>
> >                        </OntologyReference_assn>
> >                        <Associations_assnlist>
> >                           <OntologyEntry category="Measurement"
> >value="Measurement">
> >                              <OntologyReference_assn>
> >                                 <DatabaseEntry 
> accession="#Measurement"
> >URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#
> Measurement
> >">
> >                                    <Database_assnref>
> >                                       <Database_ref 
> identifier="MO"/>
> >                                    </Database_assnref>
> >                                 </DatabaseEntry>
> >                              </OntologyReference_assn>
> >                              <Associations_assnlist>
> >                                 <OntologyEntry category="has_value"
> >value="has_value">
> >                                    <OntologyReference_assn>
> >                                       <DatabaseEntry
> >accession="#has_value"
> >URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#
> has_value">
> >                                          <Database_assnref>
> >                                             <Database_ref
> >identifier="MO"/>
> >                                          </Database_assnref>
> >                                       </DatabaseEntry>
> >                                    </OntologyReference_assn>
> >                                    <Associations_assnlist>
> >                                       <OntologyEntry
> >category="has_value" value="3"/>
> >                                    </Associations_assnlist>
> >                                 </OntologyEntry>
> >                                 <OntologyEntry category="has_units"
> >value="has_units">
> >                                    <OntologyReference_assn>
> >                                       <DatabaseEntry
> >accession="#has_units"
> >URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#
> has_units">
> >                                          <Database_assnref>
> >                                             <Database_ref
> >identifier="MO"/>
> >                                          </Database_assnref>
> >                                       </DatabaseEntry>
> >                                    </OntologyReference_assn>
> >                                    <Associations_assnlist>
> >                                       <OntologyEntry
> >category="TimeUnit" value="days" description="24 hours, time unit">
> >                                          <OntologyReference_assn>
> >                                             <DatabaseEntry
> >accession="#days"
> >URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#days">
> >                                                <Database_assnref>
> >                                                   <Database_ref
> >identifier="MO"/>
> >                                                </Database_assnref>
> >                                             </DatabaseEntry>
> >                                          </OntologyReference_assn>
> >                                       </OntologyEntry>
> >                                    </Associations_assnlist>
> >                                 </OntologyEntry>
> >                              </Associations_assnlist>
> >                           </OntologyEntry>
> >                        </Associations_assnlist>
> >                     </OntologyEntry>
> >                  </Associations_assnlist>
> >               </OntologyEntry>
> ><!-- snip -->
> >            </Characteristics_assnlist>
> ><!-- snip -->
> >         </BioSource>
> >
> >by the by, the MAGE-ML examples i've looked at from the NIH 
> Neuroscience
> >Microarry Consortium are not in a valid MAGE-ML.dtd format.  
> i'll send a
> >follow-up e-mail dealing with the problems i see.  they are 
> not far off
> >but are invalid in a number of places.
> >
> >cheers,
> >michael
> >
> >Michael Miller
> >Lead Software Developer
> >Rosetta Biosoftware Business Unit
> >www.rosettabio.com
> >
> >[1] http://www.omg.org/spec/GENE/1.1/
> >
> >(sadly, the original links to the MAGEstk appear to be broken, this
> >mirror site still has the MAGE related files built up over the years,
> >here's my best guess as to the most helpful for the references)
> >[2]
> >http://www.mirrorservice.org/sites/download.sourceforge.net/p
> ub/sourcefo
> >rge/m/mg/mged/ 	
> >	v1.0:
> >http://www.mirrorservice.org/sites/download.sourceforge.net/p
> ub/sourcefo
> >rge/m/mg/mged/MAGE-2002-01-07.xmi.gz/MAGE-2002-01-07.xmi
> >	v1.1:
> >http://www.mirrorservice.org/sites/download.sourceforge.net/p
> ub/sourcefo
> >rge/m/mg/mged/MAGE.xmi.gz[peek]
> >[3]
> >http://www.mirrorservice.org/sites/download.sourceforge.net/p
> ub/sourcefo
> >rge/m/mg/mged/MAGE%20Java%20API/20010911/
> >
> >
> >  
> >
> >>-----Original Message-----
> >>From: public-semweb-lifesci-request@w3.org 
> >>[mailto:public-semweb-lifesci-request@w3.org] On Behalf Of 
> >>Helen Parkinson
> >>Sent: Wednesday, July 22, 2009 2:55 AM
> >>To: Kei Cheung
> >>Cc: HCLS; James Malone
> >>Subject: Re: BioRDF Telcon
> >>
> >>Responses in line.
> >>
> >>
> >>    
> >>
> >>>>1. We have text mined much of the Affymetrix GEO data, 
> >>>>        
> >>>>
> >>curated it and 
> >>    
> >>
> >>>>imported it into  ArrayExpress - there is now much better sample 
> >>>>annotation than the native data in GEO. We also are 
> >>>>        
> >>>>
> >>running QC across 
> >>    
> >>
> >>>>all the data files so we know which should be excluded for future 
> >>>>analyses.
> >>>>        
> >>>>
> >>>I think it's the right thing to do both to enrich data 
> >>>      
> >>>
> >>annotation and 
> >>    
> >>
> >>>to enhance data quality. This will help data integration a lot.
> >>>      
> >>>
> >>>Currently, we are exploring query federation in the neuroscience 
> >>>context. It'd be great if we can use the neuroscience use 
> >>>      
> >>>
> >>case(s) to 
> >>    
> >>
> >>>help drive your ontology development for text mining and data 
> >>>visualization. In addition to the NIH neuroscience microarray 
> >>>consortium, it may be possible to collaborate with the 
> Neuroscience 
> >>>Information Framework (NIF) to see if we can utilize some of its 
> >>>resources (e.g., neuron ontology).
> >>>      
> >>>
> >>Re-use of the neuron ontology is possible, but it depends 
> on whether 
> >>there is available data to annotate either in ArrayExpress 
> or GEO. If 
> >>you can get me a list of experiments accessions or pubmed ids 
> >>I can see 
> >>if this is feasible
> >>    
> >>
> >>>>3. We have summary level data of genes x conditions for 
> >>>>        
> >>>>
> >>~30,000 hybs 
> >>    
> >>
> >>>>worth of data in our gene expression atlas with p values 
> >>>>        
> >>>>
> >>indicating 
> >>    
> >>
> >>>>relative under/over-expression. We are planning to export 
> these as 
> >>>>triples as soon as we publish the atlas - these may be of 
> >>>>        
> >>>>
> >>interest. 
> >>    
> >>
> >>>>www.ebi.ac.uk/gxa - there's an API at present, but it will be 
> >>>>improved in the next month or so.
> >>>>        
> >>>>
> >>>It fits well with what we're currently exploring in terms 
> >>>      
> >>>
> >>of gene list 
> >>    
> >>
> >>>representation and linking genes and samples to existing 
> >>>      
> >>>
> >>ontologies. 
> >>    
> >>
> >>>It'd be great if we can download or fetch RDF triples from 
> >>>      
> >>>
> >>EBI atlas.
> >>We have a student starting work on this in a month, if you 
> >>can produce 
> >>concrete use cases for how you want to access these data we can do 
> >>something.
> >>    
> >>
> >>>>4. If neuroscience data is of specific interest we could 
> >>>>        
> >>>>
> >>do a themed 
> >>    
> >>
> >>>>atlas release where we add datasets for a given community 
> >>>>        
> >>>>
> >>or project 
> >>    
> >>
> >>>>and make these available. These can be identified by 
> >>>>        
> >>>>
> >>ArrayExpress or 
> >>    
> >>
> >>>>GEO accession or pubmed and we can re-annotate the genes vs 
> >>>>Uniprot/Ensembl, add GO terms, etc and curate the sample 
> >>>>        
> >>>>
> >>attributes 
> >>    
> >>
> >>>>and experimental variables. These pipelines are already in 
> >>>>        
> >>>>
> >>place as 
> >>    
> >>
> >>>>part of our production workflow.
> >>>>        
> >>>>
> >>>I think it's a great idea to do a themed atlas (e.g., 
> >>>      
> >>>
> >>neuro-atlas). I 
> >>    
> >>
> >>>just played with gxa a little bit. It's nice! For example, I could 
> >>>find genes that are over-expressed in the hippocampus brain region 
> >>>across different experiments. However, when I tried to do the same 
> >>>thing for neurons, there are only a few neuron types that I can 
> >>>select. It'd be nice if we can have more neuron types, for 
> instance.
> >>>      
> >>>
> >>This is probably as we don't have data - here's a list of human 
> >>experiments with the term neuron - if any of these are 
> useful, then I 
> >>can prioritise their curation and inclusion in an atlas release
> >>
> >> 
> >>http://www.ebi.ac.uk/microarray-as/ae/browse.html?keywords=neu
> >>    
> >>
> >ron&species=Homo+sapiens&array=&exptype=&pagesize=25>
> >&sortby=releasedate&sortorder=descending
> >  
> >
> >>and brain
> >>
> >>http://www.ebi.ac.uk/microarray-as/ae/browse.html?keywords=bra
> >>    
> >>
> >in&species=Homo+sapiens&array=&exptype=&pagesize=25>
> >&sortby=releasedate&sortorder=descending
> >  
> >
> >>>>I'd be very happy to collaborate, and for this group to 
> >>>>        
> >>>>
> >>use our data, 
> >>    
> >>
> >>>>we spend a lot of time adding semantic value to it, so 
> >>>>        
> >>>>
> >>please let me 
> >>    
> >>
> >>>>know if this is of interest
> >>>>        
> >>>>
> >>>We are also looking into the possibility of establishing 
> >>>      
> >>>
> >>collaboration 
> >>    
> >>
> >>>with the scientific discourse task force based on the 
> >>>      
> >>>
> >>microarray use 
> >>    
> >>
> >>>case. We're planning to have a microarray-related presentation and 
> >>>discussion on Aug. 31 (Monday, 11 am EDT/5 pm CET). 
> Details will be 
> >>>announced later. It'd be great if you can join the BioRDF call to 
> >>>participate in the discussion.
> >>>
> >>>Cheers,
> >>>
> >>>-Kei
> >>>      
> >>>
> >>>>best regards
> >>>>
> >>>>Helen
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>Kei Cheung wrote:
> >>>>        
> >>>>
> >>>>>The minutes for yesterday's BioRDF call are available at:
> >>>>>
> >>>>>
> >>>>>          
> >>>>>
> >http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Meetings/2009-
> 07-20_Confe
> >rence_Call 
> >  
> >
> >>>>>Thanks to Lena for scribing and Eric for retrieving the 
> >>>>>          
> >>>>>
> >>transcript 
> >>    
> >>
> >>>>>from the IRC log.
> >>>>>
> >>>>>Cheers,
> >>>>>
> >>>>>-Kei
> >>>>>
> >>>>>Kei Cheung wrote:
> >>>>>          
> >>>>>
> >>>>>>This is a reminder that the next BioRDF teleconf. will 
> >>>>>>            
> >>>>>>
> >>be held at 
> >>    
> >>
> >>>>>>11 am EDT (5 pm CET) on Monday, July 20 (see details below).
> >>>>>>
> >>>>>>I created the following wiki page for discussing the 
> >>>>>>            
> >>>>>>
> >>microarray use 
> >>    
> >>
> >>>>>>case:
> >>>>>>
> >>>>>>http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/QueryFederation2
> >>>>>>
> >>>>>>Cheers,
> >>>>>>
> >>>>>>-Kei
> >>>>>>
> >>>>>>== Conference Details ==
> >>>>>>* Date of Call: Monday July 20, 2009
> >>>>>>* Time of Call: 11:00 am Eastern Time
> >>>>>>* Dial-In #: +1.617.761.6200 (Cambridge, MA)
> >>>>>>* Dial-In #: +33.4.89.06.34.99 (Nice, France)
> >>>>>>* Dial-In #: +44.117.370.6152 (Bristol, UK)
> >>>>>>* Participant Access Code: 4257 ("HCLS")
> >>>>>>* IRC Channel: irc.w3.org port 6665 channel #hcls (see 
> >>>>>>[http://www.w3.org/Project/IRC/ W3C IRC page] for 
> >>>>>>            
> >>>>>>
> >>details, or see 
> >>    
> >>
> >>>>>>[http://cgi.w3.org/member-bin/irc/irc.cgi Web IRC])
> >>>>>>* Duration: ~1 hour
> >>>>>>* Frequency: bi-weekly
> >>>>>>* Convener: Kei Cheung
> >>>>>>
> >>>>>>== Agenda ==
> >>>>>>* Roll call and introduction (Kei)
> >>>>>>* TCM data quick update (Jun, Kei)
> >>>>>>* Query federation use case expanison (microarray) (All)
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>          
> >>>>>
> >>-- 
> >>Helen Parkinson, PhD
> >>ArrayExpress Production Coordinator,
> >>Microarray Informatics Team, 
> >>EBI
> >>
> >>EBI 01223 494672
> >>Skype: helen.parkinson.ebi
> >>
> >>
> >>
> >>    
> >>
> >
> >  
> >
> 
> 


Received on Friday, 24 July 2009 22:29:04 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:00:56 GMT