- From: Miller, Michael D (Rosetta) <Michael_Miller@Rosettabio.com>
- Date: Fri, 24 Jul 2009 15:28:19 -0700
- To: "Kei Cheung" <kei.cheung@yale.edu>
- Cc: "Helen Parkinson" <parkinson@ebi.ac.uk>, "HCLS" <public-semweb-lifesci@w3.org>, "James Malone" <malone@ebi.ac.uk>
- Message-ID: <C9EDB84D403E654CB78E37A506E406AF02457D63@ussemx1101.merck.com>
hi kei, good luck on your endeavors. > I > can relay your comments about the validity of mageml to the > consortium, > although I don't know whether they can address them. understandable if they can't address the changes but if they have a chance, i've attached what i've done to make sabba-affy-rat-168529 valid MAGE-ML. cheers, michael Michael Miller Lead Software Developer Rosetta Biosoftware Business Unit www.rosettabio.com > -----Original Message----- > From: Kei Cheung [mailto:kei.cheung@yale.edu] > Sent: Friday, July 24, 2009 11:34 AM > To: Miller, Michael D (Rosetta) > Cc: Helen Parkinson; HCLS; James Malone > Subject: Re: BioRDF Telcon > > Hi Michael, > > Thanks for your detailed description of mageml. For our use case, we > probably don't need to use all the information captured in > mageml. The > types of information we are currently focusing on include > experiment/sample annotation (including some provenance as you > indicated) and gene lists and how they are linked to existing > ontologies. A couple of convincing examples may be enough to start. I > can relay your comments about the validity of mageml to the > consortium, > > Cheers, > > -Kei > > Miller, Michael D (Rosetta) wrote: > > >hi kei and helen, > > > >like helen, i've been following the HCLS working groups with great > >interest. as one of the designers, with helen, of the MAGE-ML and > >MAGE-TAB specs i might be able to provide a little technical insight > >into the formats. > > > >(from helen) > >"This is probably as we don't have data - here's a list of human > >experiments with the term neuron - if any of these are > useful, then I > >can prioritize their curation and inclusion in an atlas release" > > > >kei, are the NIH Neuroscience Microarry Consortium exeriments you've > >cited and others like them in GEO or ArrayExpress? a set of > those could > >be a good starting point for helen. > > > > > > My understanding is that the publicly visible mciroarray > projects in the > neuroscience microarray consortium should also be in geo and/or > arrayexpress, although I don't know whether all the annotations are > preserved. > > > >first, MAGE-ML is based on a DTD[1], not an XSD. in early > 2002 as the > >OMG Gene Expression specification[1] was being finalized, > XSD was still > >in its infancy so we weren't comfortable at that point > generating a XSD. > >the MAGE-OM UML[2], in a very early XMI format from Rational Rose and > >UniSys, was used to generate the DTD with code we wrote > ourselves[3]. > > > >the UML model was designed to capture the flow of a microarray > >experiment and how the resulting arrays were organized in > the experiment > >based on how the samples were treated and/or on the samples' > phenotypes > >for the purpose of a reviewer understanding the methodology and for a > >researcher replicating and/or re-analyzing the results. > > > >some of the details of the flow may not be of much interest, i.e. it > >might be worth simply connecting the BioSource elements with > their gene > >expression data and not worrying about how the hybridization was > >performed. but that depends on what you want to do and you know that > >better than i. > > > >also, the data itself are specified in external files, typically in a > >white-space delimited format where the column headers are > specified in > >the MAGE-ML file in the QuantitationTypeDimension element and the > >identifiers of the row specified in one of the three > >DesignElementDimension elements, Feature, Reporter, > CompositeSequence, > >depending on how derived the data is. Also the data can be > in a vendor > >specific format such as the Affymetrix CEL (since the CEL file > >internally specifies the dimensions often they are left out of the > >MAGE-ML document). > > > >the ExperimentalFactor elements are certainly relevant and if you've > >looked at some of the examples you will noticed that the BioSource > >elements, in particular, and other elements are annotated by > >OntologyEntry elements. from the gene expression specification: > > > >"OntologyEntry > >A single entry from an ontology or a controlled vocabulary. For > >instance, category > >could be 'species name,' value could be 'homo sapiens' and ontology > >would be > >taxonomy database, NCBI." > > > >for the element an ontology entry element is annotating, we > looked at it > >as a way of specifying something like "the object identified by the > >element is an instance of the class/individual specified by the > >OntologyEntry" > > > >so from "kitm-affy-droso-176167" one sees that the BioSource is an > >"instance of" Drosophila, whole animal, whole head and an > age of 3 days: > > > > <BioSource > >identifier="arrayconsortium.tgen.org::biosource.181527" > name="Oregon R > >head 3d"> > > <Characteristics_assnlist> > > <OntologyEntry category="Organism" value="Drosophila" > >description="Drosophila"> > > <OntologyReference_assn> > > <DatabaseEntry accession="#Organism" > >URI="http://mged.sourceforge.net/ontologies/MGEDontology.php# > Organism"> > > <Database_assnref> > > <Database_ref identifier="MO"/> > > </Database_assnref> > > </DatabaseEntry> > ><!-- snip --> > > </OntologyReference_assn> > > </OntologyEntry> > > <OntologyEntry category="OrganismPart" value="whole > >animal" description=""> > > <OntologyReference_assn> > > <DatabaseEntry accession="#OrganismPart" > >URI="http://mged.sourceforge.net/ontologies/MGEDontology.php# > OrganismPar > >t"> > > <Database_assnref> > > <Database_ref identifier="MO"/> > > </Database_assnref> > > </DatabaseEntry> > > </OntologyReference_assn> > ><!-- snip --> > > </OntologyEntry> > > <OntologyEntry category="OrganismPartRegion" > value="whole > >head" description=""> > ><!-- snip --> > > </OntologyEntry> > ><!-- snip --> > > <OntologyEntry category="Age" value="Age"> > > <OntologyReference_assn> > > <DatabaseEntry accession="#Age" > >URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#Age"> > > <Database_assnref> > > <Database_ref identifier="MO"/> > > </Database_assnref> > > </DatabaseEntry> > > </OntologyReference_assn> > > <Associations_assnlist> > > <OntologyEntry category="has_measurement" > >value="has_measurement"> > > <OntologyReference_assn> > > <DatabaseEntry > accession="#has_measurement" > >URI="http://mged.sourceforge.net/ontologies/MGEDontology.php# > has_measure > >ment"> > > <Database_assnref> > > <Database_ref identifier="MO"/> > > </Database_assnref> > > </DatabaseEntry> > > </OntologyReference_assn> > > <Associations_assnlist> > > <OntologyEntry category="Measurement" > >value="Measurement"> > > <OntologyReference_assn> > > <DatabaseEntry > accession="#Measurement" > >URI="http://mged.sourceforge.net/ontologies/MGEDontology.php# > Measurement > >"> > > <Database_assnref> > > <Database_ref > identifier="MO"/> > > </Database_assnref> > > </DatabaseEntry> > > </OntologyReference_assn> > > <Associations_assnlist> > > <OntologyEntry category="has_value" > >value="has_value"> > > <OntologyReference_assn> > > <DatabaseEntry > >accession="#has_value" > >URI="http://mged.sourceforge.net/ontologies/MGEDontology.php# > has_value"> > > <Database_assnref> > > <Database_ref > >identifier="MO"/> > > </Database_assnref> > > </DatabaseEntry> > > </OntologyReference_assn> > > <Associations_assnlist> > > <OntologyEntry > >category="has_value" value="3"/> > > </Associations_assnlist> > > </OntologyEntry> > > <OntologyEntry category="has_units" > >value="has_units"> > > <OntologyReference_assn> > > <DatabaseEntry > >accession="#has_units" > >URI="http://mged.sourceforge.net/ontologies/MGEDontology.php# > has_units"> > > <Database_assnref> > > <Database_ref > >identifier="MO"/> > > </Database_assnref> > > </DatabaseEntry> > > </OntologyReference_assn> > > <Associations_assnlist> > > <OntologyEntry > >category="TimeUnit" value="days" description="24 hours, time unit"> > > <OntologyReference_assn> > > <DatabaseEntry > >accession="#days" > >URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#days"> > > <Database_assnref> > > <Database_ref > >identifier="MO"/> > > </Database_assnref> > > </DatabaseEntry> > > </OntologyReference_assn> > > </OntologyEntry> > > </Associations_assnlist> > > </OntologyEntry> > > </Associations_assnlist> > > </OntologyEntry> > > </Associations_assnlist> > > </OntologyEntry> > > </Associations_assnlist> > > </OntologyEntry> > ><!-- snip --> > > </Characteristics_assnlist> > ><!-- snip --> > > </BioSource> > > > >by the by, the MAGE-ML examples i've looked at from the NIH > Neuroscience > >Microarry Consortium are not in a valid MAGE-ML.dtd format. > i'll send a > >follow-up e-mail dealing with the problems i see. they are > not far off > >but are invalid in a number of places. > > > >cheers, > >michael > > > >Michael Miller > >Lead Software Developer > >Rosetta Biosoftware Business Unit > >www.rosettabio.com > > > >[1] http://www.omg.org/spec/GENE/1.1/ > > > >(sadly, the original links to the MAGEstk appear to be broken, this > >mirror site still has the MAGE related files built up over the years, > >here's my best guess as to the most helpful for the references) > >[2] > >http://www.mirrorservice.org/sites/download.sourceforge.net/p > ub/sourcefo > >rge/m/mg/mged/ > > v1.0: > >http://www.mirrorservice.org/sites/download.sourceforge.net/p > ub/sourcefo > >rge/m/mg/mged/MAGE-2002-01-07.xmi.gz/MAGE-2002-01-07.xmi > > v1.1: > >http://www.mirrorservice.org/sites/download.sourceforge.net/p > ub/sourcefo > >rge/m/mg/mged/MAGE.xmi.gz[peek] > >[3] > >http://www.mirrorservice.org/sites/download.sourceforge.net/p > ub/sourcefo > >rge/m/mg/mged/MAGE%20Java%20API/20010911/ > > > > > > > > > >>-----Original Message----- > >>From: public-semweb-lifesci-request@w3.org > >>[mailto:public-semweb-lifesci-request@w3.org] On Behalf Of > >>Helen Parkinson > >>Sent: Wednesday, July 22, 2009 2:55 AM > >>To: Kei Cheung > >>Cc: HCLS; James Malone > >>Subject: Re: BioRDF Telcon > >> > >>Responses in line. > >> > >> > >> > >> > >>>>1. We have text mined much of the Affymetrix GEO data, > >>>> > >>>> > >>curated it and > >> > >> > >>>>imported it into ArrayExpress - there is now much better sample > >>>>annotation than the native data in GEO. We also are > >>>> > >>>> > >>running QC across > >> > >> > >>>>all the data files so we know which should be excluded for future > >>>>analyses. > >>>> > >>>> > >>>I think it's the right thing to do both to enrich data > >>> > >>> > >>annotation and > >> > >> > >>>to enhance data quality. This will help data integration a lot. > >>> > >>> > >>>Currently, we are exploring query federation in the neuroscience > >>>context. It'd be great if we can use the neuroscience use > >>> > >>> > >>case(s) to > >> > >> > >>>help drive your ontology development for text mining and data > >>>visualization. In addition to the NIH neuroscience microarray > >>>consortium, it may be possible to collaborate with the > Neuroscience > >>>Information Framework (NIF) to see if we can utilize some of its > >>>resources (e.g., neuron ontology). > >>> > >>> > >>Re-use of the neuron ontology is possible, but it depends > on whether > >>there is available data to annotate either in ArrayExpress > or GEO. If > >>you can get me a list of experiments accessions or pubmed ids > >>I can see > >>if this is feasible > >> > >> > >>>>3. We have summary level data of genes x conditions for > >>>> > >>>> > >>~30,000 hybs > >> > >> > >>>>worth of data in our gene expression atlas with p values > >>>> > >>>> > >>indicating > >> > >> > >>>>relative under/over-expression. We are planning to export > these as > >>>>triples as soon as we publish the atlas - these may be of > >>>> > >>>> > >>interest. > >> > >> > >>>>www.ebi.ac.uk/gxa - there's an API at present, but it will be > >>>>improved in the next month or so. > >>>> > >>>> > >>>It fits well with what we're currently exploring in terms > >>> > >>> > >>of gene list > >> > >> > >>>representation and linking genes and samples to existing > >>> > >>> > >>ontologies. > >> > >> > >>>It'd be great if we can download or fetch RDF triples from > >>> > >>> > >>EBI atlas. > >>We have a student starting work on this in a month, if you > >>can produce > >>concrete use cases for how you want to access these data we can do > >>something. > >> > >> > >>>>4. If neuroscience data is of specific interest we could > >>>> > >>>> > >>do a themed > >> > >> > >>>>atlas release where we add datasets for a given community > >>>> > >>>> > >>or project > >> > >> > >>>>and make these available. These can be identified by > >>>> > >>>> > >>ArrayExpress or > >> > >> > >>>>GEO accession or pubmed and we can re-annotate the genes vs > >>>>Uniprot/Ensembl, add GO terms, etc and curate the sample > >>>> > >>>> > >>attributes > >> > >> > >>>>and experimental variables. These pipelines are already in > >>>> > >>>> > >>place as > >> > >> > >>>>part of our production workflow. > >>>> > >>>> > >>>I think it's a great idea to do a themed atlas (e.g., > >>> > >>> > >>neuro-atlas). I > >> > >> > >>>just played with gxa a little bit. It's nice! For example, I could > >>>find genes that are over-expressed in the hippocampus brain region > >>>across different experiments. However, when I tried to do the same > >>>thing for neurons, there are only a few neuron types that I can > >>>select. It'd be nice if we can have more neuron types, for > instance. > >>> > >>> > >>This is probably as we don't have data - here's a list of human > >>experiments with the term neuron - if any of these are > useful, then I > >>can prioritise their curation and inclusion in an atlas release > >> > >> > >>http://www.ebi.ac.uk/microarray-as/ae/browse.html?keywords=neu > >> > >> > >ron&species=Homo+sapiens&array=&exptype=&pagesize=25> > >&sortby=releasedate&sortorder=descending > > > > > >>and brain > >> > >>http://www.ebi.ac.uk/microarray-as/ae/browse.html?keywords=bra > >> > >> > >in&species=Homo+sapiens&array=&exptype=&pagesize=25> > >&sortby=releasedate&sortorder=descending > > > > > >>>>I'd be very happy to collaborate, and for this group to > >>>> > >>>> > >>use our data, > >> > >> > >>>>we spend a lot of time adding semantic value to it, so > >>>> > >>>> > >>please let me > >> > >> > >>>>know if this is of interest > >>>> > >>>> > >>>We are also looking into the possibility of establishing > >>> > >>> > >>collaboration > >> > >> > >>>with the scientific discourse task force based on the > >>> > >>> > >>microarray use > >> > >> > >>>case. We're planning to have a microarray-related presentation and > >>>discussion on Aug. 31 (Monday, 11 am EDT/5 pm CET). > Details will be > >>>announced later. It'd be great if you can join the BioRDF call to > >>>participate in the discussion. > >>> > >>>Cheers, > >>> > >>>-Kei > >>> > >>> > >>>>best regards > >>>> > >>>>Helen > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>Kei Cheung wrote: > >>>> > >>>> > >>>>>The minutes for yesterday's BioRDF call are available at: > >>>>> > >>>>> > >>>>> > >>>>> > >http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Meetings/2009- > 07-20_Confe > >rence_Call > > > > > >>>>>Thanks to Lena for scribing and Eric for retrieving the > >>>>> > >>>>> > >>transcript > >> > >> > >>>>>from the IRC log. > >>>>> > >>>>>Cheers, > >>>>> > >>>>>-Kei > >>>>> > >>>>>Kei Cheung wrote: > >>>>> > >>>>> > >>>>>>This is a reminder that the next BioRDF teleconf. will > >>>>>> > >>>>>> > >>be held at > >> > >> > >>>>>>11 am EDT (5 pm CET) on Monday, July 20 (see details below). > >>>>>> > >>>>>>I created the following wiki page for discussing the > >>>>>> > >>>>>> > >>microarray use > >> > >> > >>>>>>case: > >>>>>> > >>>>>>http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/QueryFederation2 > >>>>>> > >>>>>>Cheers, > >>>>>> > >>>>>>-Kei > >>>>>> > >>>>>>== Conference Details == > >>>>>>* Date of Call: Monday July 20, 2009 > >>>>>>* Time of Call: 11:00 am Eastern Time > >>>>>>* Dial-In #: +1.617.761.6200 (Cambridge, MA) > >>>>>>* Dial-In #: +33.4.89.06.34.99 (Nice, France) > >>>>>>* Dial-In #: +44.117.370.6152 (Bristol, UK) > >>>>>>* Participant Access Code: 4257 ("HCLS") > >>>>>>* IRC Channel: irc.w3.org port 6665 channel #hcls (see > >>>>>>[http://www.w3.org/Project/IRC/ W3C IRC page] for > >>>>>> > >>>>>> > >>details, or see > >> > >> > >>>>>>[http://cgi.w3.org/member-bin/irc/irc.cgi Web IRC]) > >>>>>>* Duration: ~1 hour > >>>>>>* Frequency: bi-weekly > >>>>>>* Convener: Kei Cheung > >>>>>> > >>>>>>== Agenda == > >>>>>>* Roll call and introduction (Kei) > >>>>>>* TCM data quick update (Jun, Kei) > >>>>>>* Query federation use case expanison (microarray) (All) > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>-- > >>Helen Parkinson, PhD > >>ArrayExpress Production Coordinator, > >>Microarray Informatics Team, > >>EBI > >> > >>EBI 01223 494672 > >>Skype: helen.parkinson.ebi > >> > >> > >> > >> > >> > > > > > > > >
Attachments
- text/xml attachment: mage_ml.xml
Received on Friday, 24 July 2009 22:29:04 UTC