- From: Helen Parkinson <parkinso@ebi.ac.uk>
- Date: Thu, 23 Jul 2009 19:10:43 +0100
- To: "Miller, Michael D (Rosetta)" <Michael_Miller@rosettabio.com>
- CC: Kei Cheung <kei.cheung@yale.edu>, HCLS <public-semweb-lifesci@w3.org>, James Malone <malone@ebi.ac.uk>
Hi I meant to comment on this, I would not attempt a mage-ml->RDF transform, I can probably do something more quickly with an rdf export n of transformed data analysed for over/under expressions plus factor values and genes and we'll have a student to work on this I hope Helen Miller, Michael D (Rosetta) wrote: > hi kei and helen, > > like helen, i've been following the HCLS working groups with great > interest. as one of the designers, with helen, of the MAGE-ML and > MAGE-TAB specs i might be able to provide a little technical insight > into the formats. > > (from helen) > "This is probably as we don't have data - here's a list of human > experiments with the term neuron - if any of these are useful, then I > can prioritize their curation and inclusion in an atlas release" > > kei, are the NIH Neuroscience Microarry Consortium exeriments you've > cited and others like them in GEO or ArrayExpress? a set of those could > be a good starting point for helen. > > first, MAGE-ML is based on a DTD[1], not an XSD. in early 2002 as the > OMG Gene Expression specification[1] was being finalized, XSD was still > in its infancy so we weren't comfortable at that point generating a XSD. > the MAGE-OM UML[2], in a very early XMI format from Rational Rose and > UniSys, was used to generate the DTD with code we wrote ourselves[3]. > > the UML model was designed to capture the flow of a microarray > experiment and how the resulting arrays were organized in the experiment > based on how the samples were treated and/or on the samples' phenotypes > for the purpose of a reviewer understanding the methodology and for a > researcher replicating and/or re-analyzing the results. > > some of the details of the flow may not be of much interest, i.e. it > might be worth simply connecting the BioSource elements with their gene > expression data and not worrying about how the hybridization was > performed. but that depends on what you want to do and you know that > better than i. > > also, the data itself are specified in external files, typically in a > white-space delimited format where the column headers are specified in > the MAGE-ML file in the QuantitationTypeDimension element and the > identifiers of the row specified in one of the three > DesignElementDimension elements, Feature, Reporter, CompositeSequence, > depending on how derived the data is. Also the data can be in a vendor > specific format such as the Affymetrix CEL (since the CEL file > internally specifies the dimensions often they are left out of the > MAGE-ML document). > > the ExperimentalFactor elements are certainly relevant and if you've > looked at some of the examples you will noticed that the BioSource > elements, in particular, and other elements are annotated by > OntologyEntry elements. from the gene expression specification: > > "OntologyEntry > A single entry from an ontology or a controlled vocabulary. For > instance, category > could be 'species name,' value could be 'homo sapiens' and ontology > would be > taxonomy database, NCBI." > > for the element an ontology entry element is annotating, we looked at it > as a way of specifying something like "the object identified by the > element is an instance of the class/individual specified by the > OntologyEntry" > > so from "kitm-affy-droso-176167" one sees that the BioSource is an > "instance of" Drosophila, whole animal, whole head and an age of 3 days: > > <BioSource > identifier="arrayconsortium.tgen.org::biosource.181527" name="Oregon R > head 3d"> > <Characteristics_assnlist> > <OntologyEntry category="Organism" value="Drosophila" > description="Drosophila"> > <OntologyReference_assn> > <DatabaseEntry accession="#Organism" > URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#Organism"> > <Database_assnref> > <Database_ref identifier="MO"/> > </Database_assnref> > </DatabaseEntry> > <!-- snip --> > </OntologyReference_assn> > </OntologyEntry> > <OntologyEntry category="OrganismPart" value="whole > animal" description=""> > <OntologyReference_assn> > <DatabaseEntry accession="#OrganismPart" > URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#OrganismPar > t"> > <Database_assnref> > <Database_ref identifier="MO"/> > </Database_assnref> > </DatabaseEntry> > </OntologyReference_assn> > <!-- snip --> > </OntologyEntry> > <OntologyEntry category="OrganismPartRegion" value="whole > head" description=""> > <!-- snip --> > </OntologyEntry> > <!-- snip --> > <OntologyEntry category="Age" value="Age"> > <OntologyReference_assn> > <DatabaseEntry accession="#Age" > URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#Age"> > <Database_assnref> > <Database_ref identifier="MO"/> > </Database_assnref> > </DatabaseEntry> > </OntologyReference_assn> > <Associations_assnlist> > <OntologyEntry category="has_measurement" > value="has_measurement"> > <OntologyReference_assn> > <DatabaseEntry accession="#has_measurement" > URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#has_measure > ment"> > <Database_assnref> > <Database_ref identifier="MO"/> > </Database_assnref> > </DatabaseEntry> > </OntologyReference_assn> > <Associations_assnlist> > <OntologyEntry category="Measurement" > value="Measurement"> > <OntologyReference_assn> > <DatabaseEntry accession="#Measurement" > URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#Measurement > "> > <Database_assnref> > <Database_ref identifier="MO"/> > </Database_assnref> > </DatabaseEntry> > </OntologyReference_assn> > <Associations_assnlist> > <OntologyEntry category="has_value" > value="has_value"> > <OntologyReference_assn> > <DatabaseEntry > accession="#has_value" > URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#has_value"> > <Database_assnref> > <Database_ref > identifier="MO"/> > </Database_assnref> > </DatabaseEntry> > </OntologyReference_assn> > <Associations_assnlist> > <OntologyEntry > category="has_value" value="3"/> > </Associations_assnlist> > </OntologyEntry> > <OntologyEntry category="has_units" > value="has_units"> > <OntologyReference_assn> > <DatabaseEntry > accession="#has_units" > URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#has_units"> > <Database_assnref> > <Database_ref > identifier="MO"/> > </Database_assnref> > </DatabaseEntry> > </OntologyReference_assn> > <Associations_assnlist> > <OntologyEntry > category="TimeUnit" value="days" description="24 hours, time unit"> > <OntologyReference_assn> > <DatabaseEntry > accession="#days" > URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#days"> > <Database_assnref> > <Database_ref > identifier="MO"/> > </Database_assnref> > </DatabaseEntry> > </OntologyReference_assn> > </OntologyEntry> > </Associations_assnlist> > </OntologyEntry> > </Associations_assnlist> > </OntologyEntry> > </Associations_assnlist> > </OntologyEntry> > </Associations_assnlist> > </OntologyEntry> > <!-- snip --> > </Characteristics_assnlist> > <!-- snip --> > </BioSource> > > by the by, the MAGE-ML examples i've looked at from the NIH Neuroscience > Microarry Consortium are not in a valid MAGE-ML.dtd format. i'll send a > follow-up e-mail dealing with the problems i see. they are not far off > but are invalid in a number of places. > > cheers, > michael > > Michael Miller > Lead Software Developer > Rosetta Biosoftware Business Unit > www.rosettabio.com > > [1] http://www.omg.org/spec/GENE/1.1/ > > (sadly, the original links to the MAGEstk appear to be broken, this > mirror site still has the MAGE related files built up over the years, > here's my best guess as to the most helpful for the references) > [2] > http://www.mirrorservice.org/sites/download.sourceforge.net/pub/sourcefo > rge/m/mg/mged/ > v1.0: > http://www.mirrorservice.org/sites/download.sourceforge.net/pub/sourcefo > rge/m/mg/mged/MAGE-2002-01-07.xmi.gz/MAGE-2002-01-07.xmi > v1.1: > http://www.mirrorservice.org/sites/download.sourceforge.net/pub/sourcefo > rge/m/mg/mged/MAGE.xmi.gz[peek] > [3] > http://www.mirrorservice.org/sites/download.sourceforge.net/pub/sourcefo > rge/m/mg/mged/MAGE%20Java%20API/20010911/ > > > >> -----Original Message----- >> From: public-semweb-lifesci-request@w3.org >> [mailto:public-semweb-lifesci-request@w3.org] On Behalf Of >> Helen Parkinson >> Sent: Wednesday, July 22, 2009 2:55 AM >> To: Kei Cheung >> Cc: HCLS; James Malone >> Subject: Re: BioRDF Telcon >> >> Responses in line. >> >> >> >>>> 1. We have text mined much of the Affymetrix GEO data, >>>> >> curated it and >> >>>> imported it into ArrayExpress - there is now much better sample >>>> annotation than the native data in GEO. We also are >>>> >> running QC across >> >>>> all the data files so we know which should be excluded for future >>>> analyses. >>>> >>> I think it's the right thing to do both to enrich data >>> >> annotation and >> >>> to enhance data quality. This will help data integration a lot. >>> >>> Currently, we are exploring query federation in the neuroscience >>> context. It'd be great if we can use the neuroscience use >>> >> case(s) to >> >>> help drive your ontology development for text mining and data >>> visualization. In addition to the NIH neuroscience microarray >>> consortium, it may be possible to collaborate with the Neuroscience >>> Information Framework (NIF) to see if we can utilize some of its >>> resources (e.g., neuron ontology). >>> >> Re-use of the neuron ontology is possible, but it depends on whether >> there is available data to annotate either in ArrayExpress or GEO. If >> you can get me a list of experiments accessions or pubmed ids >> I can see >> if this is feasible >> >>>> 3. We have summary level data of genes x conditions for >>>> >> ~30,000 hybs >> >>>> worth of data in our gene expression atlas with p values >>>> >> indicating >> >>>> relative under/over-expression. We are planning to export these as >>>> triples as soon as we publish the atlas - these may be of >>>> >> interest. >> >>>> www.ebi.ac.uk/gxa - there's an API at present, but it will be >>>> improved in the next month or so. >>>> >>> It fits well with what we're currently exploring in terms >>> >> of gene list >> >>> representation and linking genes and samples to existing >>> >> ontologies. >> >>> It'd be great if we can download or fetch RDF triples from >>> >> EBI atlas. >> We have a student starting work on this in a month, if you >> can produce >> concrete use cases for how you want to access these data we can do >> something. >> >>>> 4. If neuroscience data is of specific interest we could >>>> >> do a themed >> >>>> atlas release where we add datasets for a given community >>>> >> or project >> >>>> and make these available. These can be identified by >>>> >> ArrayExpress or >> >>>> GEO accession or pubmed and we can re-annotate the genes vs >>>> Uniprot/Ensembl, add GO terms, etc and curate the sample >>>> >> attributes >> >>>> and experimental variables. These pipelines are already in >>>> >> place as >> >>>> part of our production workflow. >>>> >>> I think it's a great idea to do a themed atlas (e.g., >>> >> neuro-atlas). I >> >>> just played with gxa a little bit. It's nice! For example, I could >>> find genes that are over-expressed in the hippocampus brain region >>> across different experiments. However, when I tried to do the same >>> thing for neurons, there are only a few neuron types that I can >>> select. It'd be nice if we can have more neuron types, for instance. >>> >> This is probably as we don't have data - here's a list of human >> experiments with the term neuron - if any of these are useful, then I >> can prioritise their curation and inclusion in an atlas release >> >> >> http://www.ebi.ac.uk/microarray-as/ae/browse.html?keywords=neu >> > ron&species=Homo+sapiens&array=&exptype=&pagesize=25> > &sortby=releasedate&sortorder=descending > >> and brain >> >> http://www.ebi.ac.uk/microarray-as/ae/browse.html?keywords=bra >> > in&species=Homo+sapiens&array=&exptype=&pagesize=25> > &sortby=releasedate&sortorder=descending > >>>> I'd be very happy to collaborate, and for this group to >>>> >> use our data, >> >>>> we spend a lot of time adding semantic value to it, so >>>> >> please let me >> >>>> know if this is of interest >>>> >>> We are also looking into the possibility of establishing >>> >> collaboration >> >>> with the scientific discourse task force based on the >>> >> microarray use >> >>> case. We're planning to have a microarray-related presentation and >>> discussion on Aug. 31 (Monday, 11 am EDT/5 pm CET). Details will be >>> announced later. It'd be great if you can join the BioRDF call to >>> participate in the discussion. >>> >>> Cheers, >>> >>> -Kei >>> >>>> best regards >>>> >>>> Helen >>>> >>>> >>>> >>>> >>>> >>>> >>>> Kei Cheung wrote: >>>> >>>>> The minutes for yesterday's BioRDF call are available at: >>>>> >>>>> >>>>> > http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Meetings/2009-07-20_Confe > rence_Call > >>>>> Thanks to Lena for scribing and Eric for retrieving the >>>>> >> transcript >> >>>>> from the IRC log. >>>>> >>>>> Cheers, >>>>> >>>>> -Kei >>>>> >>>>> Kei Cheung wrote: >>>>> >>>>>> This is a reminder that the next BioRDF teleconf. will >>>>>> >> be held at >> >>>>>> 11 am EDT (5 pm CET) on Monday, July 20 (see details below). >>>>>> >>>>>> I created the following wiki page for discussing the >>>>>> >> microarray use >> >>>>>> case: >>>>>> >>>>>> http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/QueryFederation2 >>>>>> >>>>>> Cheers, >>>>>> >>>>>> -Kei >>>>>> >>>>>> == Conference Details == >>>>>> * Date of Call: Monday July 20, 2009 >>>>>> * Time of Call: 11:00 am Eastern Time >>>>>> * Dial-In #: +1.617.761.6200 (Cambridge, MA) >>>>>> * Dial-In #: +33.4.89.06.34.99 (Nice, France) >>>>>> * Dial-In #: +44.117.370.6152 (Bristol, UK) >>>>>> * Participant Access Code: 4257 ("HCLS") >>>>>> * IRC Channel: irc.w3.org port 6665 channel #hcls (see >>>>>> [http://www.w3.org/Project/IRC/ W3C IRC page] for >>>>>> >> details, or see >> >>>>>> [http://cgi.w3.org/member-bin/irc/irc.cgi Web IRC]) >>>>>> * Duration: ~1 hour >>>>>> * Frequency: bi-weekly >>>>>> * Convener: Kei Cheung >>>>>> >>>>>> == Agenda == >>>>>> * Roll call and introduction (Kei) >>>>>> * TCM data quick update (Jun, Kei) >>>>>> * Query federation use case expanison (microarray) (All) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >> -- >> Helen Parkinson, PhD >> ArrayExpress Production Coordinator, >> Microarray Informatics Team, >> EBI >> >> EBI 01223 494672 >> Skype: helen.parkinson.ebi >> >> >> >>
Received on Friday, 24 July 2009 21:44:38 UTC