RE: BioRDF Telcon

hi helen,

> I can probably do something more quickly with an 
> rdf export n 
> of transformed data analysed for over/under expressions ...

the clinical genomics group at HL7 is looking for a good model to
represent gene expression based biomarkers, i.e. a set of genes and one
or more expression profiles for the set of genes where each profile maps
to a phenotype or factor.  altho the representation is more UMLish, this
sounds like it would be helpful.

> ... plus factor 
> values and genes and we'll have a student to work on this I hope

would this also be able to include the originating BioSource annotations
that aren't necessarily factor values?

cheers,
michael

> -----Original Message-----
> From: Helen Parkinson [mailto:parkinso@ebi.ac.uk] 
> Sent: Thursday, July 23, 2009 11:11 AM
> To: Miller, Michael D (Rosetta)
> Cc: Kei Cheung; HCLS; James Malone
> Subject: Re: BioRDF Telcon
> 
> Hi
> 
> I meant to comment on this, I would not attempt a mage-ml->RDF 
> transform, I can probably do something more quickly with an 
> rdf export n 
> of transformed data analysed for over/under expressions plus factor 
> values and genes and we'll have a student to work on this I hope
> 
> Helen
> 
> Miller, Michael D (Rosetta) wrote:
> > hi kei and helen,
> >
> > like helen, i've been following the HCLS working groups with great
> > interest.  as one of the designers, with helen, of the MAGE-ML and
> > MAGE-TAB specs i might be able to provide a little technical insight
> > into the formats.
> >
> > (from helen)
> > "This is probably as we don't have data - here's a list of human 
> > experiments with the term neuron - if any of these are 
> useful, then I 
> > can prioritize their curation and inclusion in an atlas release"
> >
> > kei, are the NIH Neuroscience Microarry Consortium exeriments you've
> > cited and others like them in GEO or ArrayExpress?  a set 
> of those could
> > be a good starting point for helen.
> >
> > first, MAGE-ML is based on a DTD[1], not an XSD.  in early 
> 2002 as the
> > OMG Gene Expression specification[1] was being finalized, 
> XSD was still
> > in its infancy so we weren't comfortable at that point 
> generating a XSD.
> > the MAGE-OM UML[2], in a very early XMI format from 
> Rational Rose and
> > UniSys, was used to generate the DTD with code we wrote 
> ourselves[3]. 
> >
> > the UML model was designed to capture the flow of a microarray
> > experiment and how the resulting arrays were organized in 
> the experiment
> > based on how the samples were treated and/or on the 
> samples' phenotypes
> > for the purpose of a reviewer understanding the methodology 
> and for a
> > researcher replicating and/or re-analyzing the results.  
> >
> > some of the details of the flow may not be of much interest, i.e. it
> > might be worth simply connecting the BioSource elements 
> with their gene
> > expression data and not worrying about how the hybridization was
> > performed.  but that depends on what you want to do and you 
> know that
> > better than i.
> >
> > also, the data itself are specified in external files, 
> typically in a
> > white-space delimited format where the column headers are 
> specified in
> > the MAGE-ML file in the QuantitationTypeDimension element and the
> > identifiers of the row specified in one of the three
> > DesignElementDimension elements, Feature, Reporter, 
> CompositeSequence,
> > depending on how derived the data is.  Also the data can be 
> in a vendor
> > specific format such as the Affymetrix CEL (since the CEL file
> > internally specifies the dimensions often they are left out of the
> > MAGE-ML document).
> >
> > the ExperimentalFactor elements are certainly relevant and if you've
> > looked at some of the examples you will noticed that the BioSource
> > elements, in particular, and other elements are annotated by
> > OntologyEntry elements.  from the gene expression specification:
> >
> > "OntologyEntry
> > A single entry from an ontology or a controlled vocabulary. For
> > instance, category
> > could be 'species name,' value could be 'homo sapiens' and ontology
> > would be
> > taxonomy database, NCBI."
> >
> > for the element an ontology entry element is annotating, we 
> looked at it
> > as a way of specifying something like "the object identified by the
> > element is an instance of the class/individual specified by the
> > OntologyEntry"
> >
> > so from "kitm-affy-droso-176167" one sees that the BioSource is an
> > "instance of" Drosophila, whole animal, whole head and an 
> age of 3 days:
> >
> >          <BioSource
> > identifier="arrayconsortium.tgen.org::biosource.181527" 
> name="Oregon R
> > head 3d">
> >             <Characteristics_assnlist>
> >                <OntologyEntry category="Organism" value="Drosophila"
> > description="Drosophila">
> >                   <OntologyReference_assn>
> >                      <DatabaseEntry accession="#Organism"
> > 
> URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#O
> rganism">
> >                         <Database_assnref>
> >                            <Database_ref identifier="MO"/>
> >                         </Database_assnref>
> >                      </DatabaseEntry>
> > <!-- snip -->
> >                   </OntologyReference_assn>
> >                </OntologyEntry>
> >                <OntologyEntry category="OrganismPart" value="whole
> > animal" description="">
> >                   <OntologyReference_assn>
> >                      <DatabaseEntry accession="#OrganismPart"
> > 
> URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#O
> rganismPar
> > t">
> >                         <Database_assnref>
> >                            <Database_ref identifier="MO"/>
> >                         </Database_assnref>
> >                      </DatabaseEntry>
> >                   </OntologyReference_assn>
> > <!-- snip -->
> >                </OntologyEntry>
> >                <OntologyEntry category="OrganismPartRegion" 
> value="whole
> > head" description="">
> > <!-- snip -->
> >                </OntologyEntry>
> > <!-- snip -->
> >                <OntologyEntry category="Age" value="Age">
> >                   <OntologyReference_assn>
> >                      <DatabaseEntry accession="#Age"
> > URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#Age">
> >                         <Database_assnref>
> >                            <Database_ref identifier="MO"/>
> >                         </Database_assnref>
> >                      </DatabaseEntry>
> >                   </OntologyReference_assn>
> >                   <Associations_assnlist>
> >                      <OntologyEntry category="has_measurement"
> > value="has_measurement">
> >                         <OntologyReference_assn>
> >                            <DatabaseEntry 
> accession="#has_measurement"
> > 
> URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#h
> as_measure
> > ment">
> >                               <Database_assnref>
> >                                  <Database_ref identifier="MO"/>
> >                               </Database_assnref>
> >                            </DatabaseEntry>
> >                         </OntologyReference_assn>
> >                         <Associations_assnlist>
> >                            <OntologyEntry category="Measurement"
> > value="Measurement">
> >                               <OntologyReference_assn>
> >                                  <DatabaseEntry 
> accession="#Measurement"
> > 
> URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#M
> easurement
> > ">
> >                                     <Database_assnref>
> >                                        <Database_ref 
> identifier="MO"/>
> >                                     </Database_assnref>
> >                                  </DatabaseEntry>
> >                               </OntologyReference_assn>
> >                               <Associations_assnlist>
> >                                  <OntologyEntry category="has_value"
> > value="has_value">
> >                                     <OntologyReference_assn>
> >                                        <DatabaseEntry
> > accession="#has_value"
> > 
> URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#h
> as_value">
> >                                           <Database_assnref>
> >                                              <Database_ref
> > identifier="MO"/>
> >                                           </Database_assnref>
> >                                        </DatabaseEntry>
> >                                     </OntologyReference_assn>
> >                                     <Associations_assnlist>
> >                                        <OntologyEntry
> > category="has_value" value="3"/>
> >                                     </Associations_assnlist>
> >                                  </OntologyEntry>
> >                                  <OntologyEntry category="has_units"
> > value="has_units">
> >                                     <OntologyReference_assn>
> >                                        <DatabaseEntry
> > accession="#has_units"
> > 
> URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#h
> as_units">
> >                                           <Database_assnref>
> >                                              <Database_ref
> > identifier="MO"/>
> >                                           </Database_assnref>
> >                                        </DatabaseEntry>
> >                                     </OntologyReference_assn>
> >                                     <Associations_assnlist>
> >                                        <OntologyEntry
> > category="TimeUnit" value="days" description="24 hours, time unit">
> >                                           <OntologyReference_assn>
> >                                              <DatabaseEntry
> > accession="#days"
> > URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#days">
> >                                                 <Database_assnref>
> >                                                    <Database_ref
> > identifier="MO"/>
> >                                                 </Database_assnref>
> >                                              </DatabaseEntry>
> >                                           </OntologyReference_assn>
> >                                        </OntologyEntry>
> >                                     </Associations_assnlist>
> >                                  </OntologyEntry>
> >                               </Associations_assnlist>
> >                            </OntologyEntry>
> >                         </Associations_assnlist>
> >                      </OntologyEntry>
> >                   </Associations_assnlist>
> >                </OntologyEntry>
> > <!-- snip -->
> >             </Characteristics_assnlist>
> > <!-- snip -->
> >          </BioSource>
> >
> > by the by, the MAGE-ML examples i've looked at from the NIH 
> Neuroscience
> > Microarry Consortium are not in a valid MAGE-ML.dtd format. 
>  i'll send a
> > follow-up e-mail dealing with the problems i see.  they are 
> not far off
> > but are invalid in a number of places.
> >
> > cheers,
> > michael
> >
> > Michael Miller
> > Lead Software Developer
> > Rosetta Biosoftware Business Unit
> > www.rosettabio.com
> >
> > [1] http://www.omg.org/spec/GENE/1.1/
> >
> > (sadly, the original links to the MAGEstk appear to be broken, this
> > mirror site still has the MAGE related files built up over 
> the years,
> > here's my best guess as to the most helpful for the references)
> > [2]
> > 
> http://www.mirrorservice.org/sites/download.sourceforge.net/pu
> b/sourcefo
> > rge/m/mg/mged/ 	
> > 	v1.0:
> > 
> http://www.mirrorservice.org/sites/download.sourceforge.net/pu
> b/sourcefo
> > rge/m/mg/mged/MAGE-2002-01-07.xmi.gz/MAGE-2002-01-07.xmi
> > 	v1.1:
> > 
> http://www.mirrorservice.org/sites/download.sourceforge.net/pu
> b/sourcefo
> > rge/m/mg/mged/MAGE.xmi.gz[peek]
> > [3]
> > 
> http://www.mirrorservice.org/sites/download.sourceforge.net/pu
> b/sourcefo
> > rge/m/mg/mged/MAGE%20Java%20API/20010911/
> >
> >
> >   
> >> -----Original Message-----
> >> From: public-semweb-lifesci-request@w3.org 
> >> [mailto:public-semweb-lifesci-request@w3.org] On Behalf Of 
> >> Helen Parkinson
> >> Sent: Wednesday, July 22, 2009 2:55 AM
> >> To: Kei Cheung
> >> Cc: HCLS; James Malone
> >> Subject: Re: BioRDF Telcon
> >>
> >> Responses in line.
> >>
> >>
> >>     
> >>>> 1. We have text mined much of the Affymetrix GEO data, 
> >>>>         
> >> curated it and 
> >>     
> >>>> imported it into  ArrayExpress - there is now much better sample 
> >>>> annotation than the native data in GEO. We also are 
> >>>>         
> >> running QC across 
> >>     
> >>>> all the data files so we know which should be excluded 
> for future 
> >>>> analyses.
> >>>>         
> >>> I think it's the right thing to do both to enrich data 
> >>>       
> >> annotation and 
> >>     
> >>> to enhance data quality. This will help data integration a lot.
> >>>       
> >>> Currently, we are exploring query federation in the neuroscience 
> >>> context. It'd be great if we can use the neuroscience use 
> >>>       
> >> case(s) to 
> >>     
> >>> help drive your ontology development for text mining and data 
> >>> visualization. In addition to the NIH neuroscience microarray 
> >>> consortium, it may be possible to collaborate with the 
> Neuroscience 
> >>> Information Framework (NIF) to see if we can utilize some of its 
> >>> resources (e.g., neuron ontology).
> >>>       
> >> Re-use of the neuron ontology is possible, but it depends 
> on whether 
> >> there is available data to annotate either in ArrayExpress 
> or GEO. If 
> >> you can get me a list of experiments accessions or pubmed ids 
> >> I can see 
> >> if this is feasible
> >>     
> >>>> 3. We have summary level data of genes x conditions for 
> >>>>         
> >> ~30,000 hybs 
> >>     
> >>>> worth of data in our gene expression atlas with p values 
> >>>>         
> >> indicating 
> >>     
> >>>> relative under/over-expression. We are planning to 
> export these as 
> >>>> triples as soon as we publish the atlas - these may be of 
> >>>>         
> >> interest. 
> >>     
> >>>> www.ebi.ac.uk/gxa - there's an API at present, but it will be 
> >>>> improved in the next month or so.
> >>>>         
> >>> It fits well with what we're currently exploring in terms 
> >>>       
> >> of gene list 
> >>     
> >>> representation and linking genes and samples to existing 
> >>>       
> >> ontologies. 
> >>     
> >>> It'd be great if we can download or fetch RDF triples from 
> >>>       
> >> EBI atlas.
> >> We have a student starting work on this in a month, if you 
> >> can produce 
> >> concrete use cases for how you want to access these data we can do 
> >> something.
> >>     
> >>>> 4. If neuroscience data is of specific interest we could 
> >>>>         
> >> do a themed 
> >>     
> >>>> atlas release where we add datasets for a given community 
> >>>>         
> >> or project 
> >>     
> >>>> and make these available. These can be identified by 
> >>>>         
> >> ArrayExpress or 
> >>     
> >>>> GEO accession or pubmed and we can re-annotate the genes vs 
> >>>> Uniprot/Ensembl, add GO terms, etc and curate the sample 
> >>>>         
> >> attributes 
> >>     
> >>>> and experimental variables. These pipelines are already in 
> >>>>         
> >> place as 
> >>     
> >>>> part of our production workflow.
> >>>>         
> >>> I think it's a great idea to do a themed atlas (e.g., 
> >>>       
> >> neuro-atlas). I 
> >>     
> >>> just played with gxa a little bit. It's nice! For 
> example, I could 
> >>> find genes that are over-expressed in the hippocampus 
> brain region 
> >>> across different experiments. However, when I tried to do 
> the same 
> >>> thing for neurons, there are only a few neuron types that I can 
> >>> select. It'd be nice if we can have more neuron types, 
> for instance.
> >>>       
> >> This is probably as we don't have data - here's a list of human 
> >> experiments with the term neuron - if any of these are 
> useful, then I 
> >> can prioritise their curation and inclusion in an atlas release
> >>
> >>  
> >> http://www.ebi.ac.uk/microarray-as/ae/browse.html?keywords=neu
> >>     
> > ron&species=Homo+sapiens&array=&exptype=&pagesize=25>
> > &sortby=releasedate&sortorder=descending
> >   
> >> and brain
> >>
> >> http://www.ebi.ac.uk/microarray-as/ae/browse.html?keywords=bra
> >>     
> > in&species=Homo+sapiens&array=&exptype=&pagesize=25>
> > &sortby=releasedate&sortorder=descending
> >   
> >>>> I'd be very happy to collaborate, and for this group to 
> >>>>         
> >> use our data, 
> >>     
> >>>> we spend a lot of time adding semantic value to it, so 
> >>>>         
> >> please let me 
> >>     
> >>>> know if this is of interest
> >>>>         
> >>> We are also looking into the possibility of establishing 
> >>>       
> >> collaboration 
> >>     
> >>> with the scientific discourse task force based on the 
> >>>       
> >> microarray use 
> >>     
> >>> case. We're planning to have a microarray-related 
> presentation and 
> >>> discussion on Aug. 31 (Monday, 11 am EDT/5 pm CET). 
> Details will be 
> >>> announced later. It'd be great if you can join the BioRDF call to 
> >>> participate in the discussion.
> >>>
> >>> Cheers,
> >>>
> >>> -Kei
> >>>       
> >>>> best regards
> >>>>
> >>>> Helen
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Kei Cheung wrote:
> >>>>         
> >>>>> The minutes for yesterday's BioRDF call are available at:
> >>>>>
> >>>>>
> >>>>>           
> > 
> http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Meetings/2009-0
> 7-20_Confe
> > rence_Call 
> >   
> >>>>> Thanks to Lena for scribing and Eric for retrieving the 
> >>>>>           
> >> transcript 
> >>     
> >>>>> from the IRC log.
> >>>>>
> >>>>> Cheers,
> >>>>>
> >>>>> -Kei
> >>>>>
> >>>>> Kei Cheung wrote:
> >>>>>           
> >>>>>> This is a reminder that the next BioRDF teleconf. will 
> >>>>>>             
> >> be held at 
> >>     
> >>>>>> 11 am EDT (5 pm CET) on Monday, July 20 (see details below).
> >>>>>>
> >>>>>> I created the following wiki page for discussing the 
> >>>>>>             
> >> microarray use 
> >>     
> >>>>>> case:
> >>>>>>
> >>>>>> http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/QueryFederation2
> >>>>>>
> >>>>>> Cheers,
> >>>>>>
> >>>>>> -Kei
> >>>>>>
> >>>>>> == Conference Details ==
> >>>>>> * Date of Call: Monday July 20, 2009
> >>>>>> * Time of Call: 11:00 am Eastern Time
> >>>>>> * Dial-In #: +1.617.761.6200 (Cambridge, MA)
> >>>>>> * Dial-In #: +33.4.89.06.34.99 (Nice, France)
> >>>>>> * Dial-In #: +44.117.370.6152 (Bristol, UK)
> >>>>>> * Participant Access Code: 4257 ("HCLS")
> >>>>>> * IRC Channel: irc.w3.org port 6665 channel #hcls (see 
> >>>>>> [http://www.w3.org/Project/IRC/ W3C IRC page] for 
> >>>>>>             
> >> details, or see 
> >>     
> >>>>>> [http://cgi.w3.org/member-bin/irc/irc.cgi Web IRC])
> >>>>>> * Duration: ~1 hour
> >>>>>> * Frequency: bi-weekly
> >>>>>> * Convener: Kei Cheung
> >>>>>>
> >>>>>> == Agenda ==
> >>>>>> * Roll call and introduction (Kei)
> >>>>>> * TCM data quick update (Jun, Kei)
> >>>>>> * Query federation use case expanison (microarray) (All)
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>             
> >>>>>           
> >> -- 
> >> Helen Parkinson, PhD
> >> ArrayExpress Production Coordinator,
> >> Microarray Informatics Team, 
> >> EBI
> >>
> >> EBI 01223 494672
> >> Skype: helen.parkinson.ebi
> >>
> >>
> >>
> >>     
> 

Received on Thursday, 23 July 2009 19:12:08 UTC