Re: BioRDF Telcon from Helen Parkinson on 2009-07-23 (public-semweb-lifesci@w3.org from July 2009)

From: Helen Parkinson <parkinso@ebi.ac.uk>
Date: Thu, 23 Jul 2009 19:10:43 +0100
To: "Miller, Michael D (Rosetta)" <Michael_Miller@rosettabio.com>
CC: Kei Cheung <kei.cheung@yale.edu>, HCLS <public-semweb-lifesci@w3.org>, James Malone <malone@ebi.ac.uk>
Message-ID: <4A68A7A3.9070705@ebi.ac.uk>
Hi

I meant to comment on this, I would not attempt a mage-ml->RDF 
transform, I can probably do something more quickly with an rdf export n 
of transformed data analysed for over/under expressions plus factor 
values and genes and we'll have a student to work on this I hope

Helen

Miller, Michael D (Rosetta) wrote:
> hi kei and helen,
>
> like helen, i've been following the HCLS working groups with great
> interest.  as one of the designers, with helen, of the MAGE-ML and
> MAGE-TAB specs i might be able to provide a little technical insight
> into the formats.
>
> (from helen)
> "This is probably as we don't have data - here's a list of human 
> experiments with the term neuron - if any of these are useful, then I 
> can prioritize their curation and inclusion in an atlas release"
>
> kei, are the NIH Neuroscience Microarry Consortium exeriments you've
> cited and others like them in GEO or ArrayExpress?  a set of those could
> be a good starting point for helen.
>
> first, MAGE-ML is based on a DTD[1], not an XSD.  in early 2002 as the
> OMG Gene Expression specification[1] was being finalized, XSD was still
> in its infancy so we weren't comfortable at that point generating a XSD.
> the MAGE-OM UML[2], in a very early XMI format from Rational Rose and
> UniSys, was used to generate the DTD with code we wrote ourselves[3]. 
>
> the UML model was designed to capture the flow of a microarray
> experiment and how the resulting arrays were organized in the experiment
> based on how the samples were treated and/or on the samples' phenotypes
> for the purpose of a reviewer understanding the methodology and for a
> researcher replicating and/or re-analyzing the results.  
>
> some of the details of the flow may not be of much interest, i.e. it
> might be worth simply connecting the BioSource elements with their gene
> expression data and not worrying about how the hybridization was
> performed.  but that depends on what you want to do and you know that
> better than i.
>
> also, the data itself are specified in external files, typically in a
> white-space delimited format where the column headers are specified in
> the MAGE-ML file in the QuantitationTypeDimension element and the
> identifiers of the row specified in one of the three
> DesignElementDimension elements, Feature, Reporter, CompositeSequence,
> depending on how derived the data is.  Also the data can be in a vendor
> specific format such as the Affymetrix CEL (since the CEL file
> internally specifies the dimensions often they are left out of the
> MAGE-ML document).
>
> the ExperimentalFactor elements are certainly relevant and if you've
> looked at some of the examples you will noticed that the BioSource
> elements, in particular, and other elements are annotated by
> OntologyEntry elements.  from the gene expression specification:
>
> "OntologyEntry
> A single entry from an ontology or a controlled vocabulary. For
> instance, category
> could be 'species name,' value could be 'homo sapiens' and ontology
> would be
> taxonomy database, NCBI."
>
> for the element an ontology entry element is annotating, we looked at it
> as a way of specifying something like "the object identified by the
> element is an instance of the class/individual specified by the
> OntologyEntry"
>
> so from "kitm-affy-droso-176167" one sees that the BioSource is an
> "instance of" Drosophila, whole animal, whole head and an age of 3 days:
>
>          <BioSource
> identifier="arrayconsortium.tgen.org::biosource.181527" name="Oregon R
> head 3d">
>             <Characteristics_assnlist>
>                <OntologyEntry category="Organism" value="Drosophila"
> description="Drosophila">
>                   <OntologyReference_assn>
>                      <DatabaseEntry accession="#Organism"
> URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#Organism">
>                         <Database_assnref>
>                            <Database_ref identifier="MO"/>
>                         </Database_assnref>
>                      </DatabaseEntry>
> <!-- snip -->
>                   </OntologyReference_assn>
>                </OntologyEntry>
>                <OntologyEntry category="OrganismPart" value="whole
> animal" description="">
>                   <OntologyReference_assn>
>                      <DatabaseEntry accession="#OrganismPart"
> URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#OrganismPar
> t">
>                         <Database_assnref>
>                            <Database_ref identifier="MO"/>
>                         </Database_assnref>
>                      </DatabaseEntry>
>                   </OntologyReference_assn>
> <!-- snip -->
>                </OntologyEntry>
>                <OntologyEntry category="OrganismPartRegion" value="whole
> head" description="">
> <!-- snip -->
>                </OntologyEntry>
> <!-- snip -->
>                <OntologyEntry category="Age" value="Age">
>                   <OntologyReference_assn>
>                      <DatabaseEntry accession="#Age"
> URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#Age">
>                         <Database_assnref>
>                            <Database_ref identifier="MO"/>
>                         </Database_assnref>
>                      </DatabaseEntry>
>                   </OntologyReference_assn>
>                   <Associations_assnlist>
>                      <OntologyEntry category="has_measurement"
> value="has_measurement">
>                         <OntologyReference_assn>
>                            <DatabaseEntry accession="#has_measurement"
> URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#has_measure
> ment">
>                               <Database_assnref>
>                                  <Database_ref identifier="MO"/>
>                               </Database_assnref>
>                            </DatabaseEntry>
>                         </OntologyReference_assn>
>                         <Associations_assnlist>
>                            <OntologyEntry category="Measurement"
> value="Measurement">
>                               <OntologyReference_assn>
>                                  <DatabaseEntry accession="#Measurement"
> URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#Measurement
> ">
>                                     <Database_assnref>
>                                        <Database_ref identifier="MO"/>
>                                     </Database_assnref>
>                                  </DatabaseEntry>
>                               </OntologyReference_assn>
>                               <Associations_assnlist>
>                                  <OntologyEntry category="has_value"
> value="has_value">
>                                     <OntologyReference_assn>
>                                        <DatabaseEntry
> accession="#has_value"
> URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#has_value">
>                                           <Database_assnref>
>                                              <Database_ref
> identifier="MO"/>
>                                           </Database_assnref>
>                                        </DatabaseEntry>
>                                     </OntologyReference_assn>
>                                     <Associations_assnlist>
>                                        <OntologyEntry
> category="has_value" value="3"/>
>                                     </Associations_assnlist>
>                                  </OntologyEntry>
>                                  <OntologyEntry category="has_units"
> value="has_units">
>                                     <OntologyReference_assn>
>                                        <DatabaseEntry
> accession="#has_units"
> URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#has_units">
>                                           <Database_assnref>
>                                              <Database_ref
> identifier="MO"/>
>                                           </Database_assnref>
>                                        </DatabaseEntry>
>                                     </OntologyReference_assn>
>                                     <Associations_assnlist>
>                                        <OntologyEntry
> category="TimeUnit" value="days" description="24 hours, time unit">
>                                           <OntologyReference_assn>
>                                              <DatabaseEntry
> accession="#days"
> URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#days">
>                                                 <Database_assnref>
>                                                    <Database_ref
> identifier="MO"/>
>                                                 </Database_assnref>
>                                              </DatabaseEntry>
>                                           </OntologyReference_assn>
>                                        </OntologyEntry>
>                                     </Associations_assnlist>
>                                  </OntologyEntry>
>                               </Associations_assnlist>
>                            </OntologyEntry>
>                         </Associations_assnlist>
>                      </OntologyEntry>
>                   </Associations_assnlist>
>                </OntologyEntry>
> <!-- snip -->
>             </Characteristics_assnlist>
> <!-- snip -->
>          </BioSource>
>
> by the by, the MAGE-ML examples i've looked at from the NIH Neuroscience
> Microarry Consortium are not in a valid MAGE-ML.dtd format.  i'll send a
> follow-up e-mail dealing with the problems i see.  they are not far off
> but are invalid in a number of places.
>
> cheers,
> michael
>
> Michael Miller
> Lead Software Developer
> Rosetta Biosoftware Business Unit
> www.rosettabio.com
>
> [1] http://www.omg.org/spec/GENE/1.1/
>
> (sadly, the original links to the MAGEstk appear to be broken, this
> mirror site still has the MAGE related files built up over the years,
> here's my best guess as to the most helpful for the references)
> [2]
> http://www.mirrorservice.org/sites/download.sourceforge.net/pub/sourcefo
> rge/m/mg/mged/ 	
> 	v1.0:
> http://www.mirrorservice.org/sites/download.sourceforge.net/pub/sourcefo
> rge/m/mg/mged/MAGE-2002-01-07.xmi.gz/MAGE-2002-01-07.xmi
> 	v1.1:
> http://www.mirrorservice.org/sites/download.sourceforge.net/pub/sourcefo
> rge/m/mg/mged/MAGE.xmi.gz[peek]
> [3]
> http://www.mirrorservice.org/sites/download.sourceforge.net/pub/sourcefo
> rge/m/mg/mged/MAGE%20Java%20API/20010911/
>
>
>   
>> -----Original Message-----
>> From: public-semweb-lifesci-request@w3.org 
>> [mailto:public-semweb-lifesci-request@w3.org] On Behalf Of 
>> Helen Parkinson
>> Sent: Wednesday, July 22, 2009 2:55 AM
>> To: Kei Cheung
>> Cc: HCLS; James Malone
>> Subject: Re: BioRDF Telcon
>>
>> Responses in line.
>>
>>
>>     
>>>> 1. We have text mined much of the Affymetrix GEO data, 
>>>>         
>> curated it and 
>>     
>>>> imported it into  ArrayExpress - there is now much better sample 
>>>> annotation than the native data in GEO. We also are 
>>>>         
>> running QC across 
>>     
>>>> all the data files so we know which should be excluded for future 
>>>> analyses.
>>>>         
>>> I think it's the right thing to do both to enrich data 
>>>       
>> annotation and 
>>     
>>> to enhance data quality. This will help data integration a lot.
>>>       
>>> Currently, we are exploring query federation in the neuroscience 
>>> context. It'd be great if we can use the neuroscience use 
>>>       
>> case(s) to 
>>     
>>> help drive your ontology development for text mining and data 
>>> visualization. In addition to the NIH neuroscience microarray 
>>> consortium, it may be possible to collaborate with the Neuroscience 
>>> Information Framework (NIF) to see if we can utilize some of its 
>>> resources (e.g., neuron ontology).
>>>       
>> Re-use of the neuron ontology is possible, but it depends on whether 
>> there is available data to annotate either in ArrayExpress or GEO. If 
>> you can get me a list of experiments accessions or pubmed ids 
>> I can see 
>> if this is feasible
>>     
>>>> 3. We have summary level data of genes x conditions for 
>>>>         
>> ~30,000 hybs 
>>     
>>>> worth of data in our gene expression atlas with p values 
>>>>         
>> indicating 
>>     
>>>> relative under/over-expression. We are planning to export these as 
>>>> triples as soon as we publish the atlas - these may be of 
>>>>         
>> interest. 
>>     
>>>> www.ebi.ac.uk/gxa - there's an API at present, but it will be 
>>>> improved in the next month or so.
>>>>         
>>> It fits well with what we're currently exploring in terms 
>>>       
>> of gene list 
>>     
>>> representation and linking genes and samples to existing 
>>>       
>> ontologies. 
>>     
>>> It'd be great if we can download or fetch RDF triples from 
>>>       
>> EBI atlas.
>> We have a student starting work on this in a month, if you 
>> can produce 
>> concrete use cases for how you want to access these data we can do 
>> something.
>>     
>>>> 4. If neuroscience data is of specific interest we could 
>>>>         
>> do a themed 
>>     
>>>> atlas release where we add datasets for a given community 
>>>>         
>> or project 
>>     
>>>> and make these available. These can be identified by 
>>>>         
>> ArrayExpress or 
>>     
>>>> GEO accession or pubmed and we can re-annotate the genes vs 
>>>> Uniprot/Ensembl, add GO terms, etc and curate the sample 
>>>>         
>> attributes 
>>     
>>>> and experimental variables. These pipelines are already in 
>>>>         
>> place as 
>>     
>>>> part of our production workflow.
>>>>         
>>> I think it's a great idea to do a themed atlas (e.g., 
>>>       
>> neuro-atlas). I 
>>     
>>> just played with gxa a little bit. It's nice! For example, I could 
>>> find genes that are over-expressed in the hippocampus brain region 
>>> across different experiments. However, when I tried to do the same 
>>> thing for neurons, there are only a few neuron types that I can 
>>> select. It'd be nice if we can have more neuron types, for instance.
>>>       
>> This is probably as we don't have data - here's a list of human 
>> experiments with the term neuron - if any of these are useful, then I 
>> can prioritise their curation and inclusion in an atlas release
>>
>>  
>> http://www.ebi.ac.uk/microarray-as/ae/browse.html?keywords=neu
>>     
> ron&species=Homo+sapiens&array=&exptype=&pagesize=25>
> &sortby=releasedate&sortorder=descending
>   
>> and brain
>>
>> http://www.ebi.ac.uk/microarray-as/ae/browse.html?keywords=bra
>>     
> in&species=Homo+sapiens&array=&exptype=&pagesize=25>
> &sortby=releasedate&sortorder=descending
>   
>>>> I'd be very happy to collaborate, and for this group to 
>>>>         
>> use our data, 
>>     
>>>> we spend a lot of time adding semantic value to it, so 
>>>>         
>> please let me 
>>     
>>>> know if this is of interest
>>>>         
>>> We are also looking into the possibility of establishing 
>>>       
>> collaboration 
>>     
>>> with the scientific discourse task force based on the 
>>>       
>> microarray use 
>>     
>>> case. We're planning to have a microarray-related presentation and 
>>> discussion on Aug. 31 (Monday, 11 am EDT/5 pm CET). Details will be 
>>> announced later. It'd be great if you can join the BioRDF call to 
>>> participate in the discussion.
>>>
>>> Cheers,
>>>
>>> -Kei
>>>       
>>>> best regards
>>>>
>>>> Helen
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Kei Cheung wrote:
>>>>         
>>>>> The minutes for yesterday's BioRDF call are available at:
>>>>>
>>>>>
>>>>>           
> http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Meetings/2009-07-20_Confe
> rence_Call 
>   
>>>>> Thanks to Lena for scribing and Eric for retrieving the 
>>>>>           
>> transcript 
>>     
>>>>> from the IRC log.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> -Kei
>>>>>
>>>>> Kei Cheung wrote:
>>>>>           
>>>>>> This is a reminder that the next BioRDF teleconf. will 
>>>>>>             
>> be held at 
>>     
>>>>>> 11 am EDT (5 pm CET) on Monday, July 20 (see details below).
>>>>>>
>>>>>> I created the following wiki page for discussing the 
>>>>>>             
>> microarray use 
>>     
>>>>>> case:
>>>>>>
>>>>>> http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/QueryFederation2
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> -Kei
>>>>>>
>>>>>> == Conference Details ==
>>>>>> * Date of Call: Monday July 20, 2009
>>>>>> * Time of Call: 11:00 am Eastern Time
>>>>>> * Dial-In #: +1.617.761.6200 (Cambridge, MA)
>>>>>> * Dial-In #: +33.4.89.06.34.99 (Nice, France)
>>>>>> * Dial-In #: +44.117.370.6152 (Bristol, UK)
>>>>>> * Participant Access Code: 4257 ("HCLS")
>>>>>> * IRC Channel: irc.w3.org port 6665 channel #hcls (see 
>>>>>> [http://www.w3.org/Project/IRC/ W3C IRC page] for 
>>>>>>             
>> details, or see 
>>     
>>>>>> [http://cgi.w3.org/member-bin/irc/irc.cgi Web IRC])
>>>>>> * Duration: ~1 hour
>>>>>> * Frequency: bi-weekly
>>>>>> * Convener: Kei Cheung
>>>>>>
>>>>>> == Agenda ==
>>>>>> * Roll call and introduction (Kei)
>>>>>> * TCM data quick update (Jun, Kei)
>>>>>> * Query federation use case expanison (microarray) (All)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>           
>> -- 
>> Helen Parkinson, PhD
>> ArrayExpress Production Coordinator,
>> Microarray Informatics Team, 
>> EBI
>>
>> EBI 01223 494672
>> Skype: helen.parkinson.ebi
>>
>>
>>
>>
Received on Friday, 24 July 2009 21:44:38 UTC