- From: Miller, Michael D (Rosetta) <Michael_Miller@Rosettabio.com>
- Date: Thu, 23 Jul 2009 12:11:24 -0700
- To: "Helen Parkinson" <parkinso@ebi.ac.uk>
- Cc: "Kei Cheung" <kei.cheung@yale.edu>, "HCLS" <public-semweb-lifesci@w3.org>, "James Malone" <malone@ebi.ac.uk>
hi helen, > I can probably do something more quickly with an > rdf export n > of transformed data analysed for over/under expressions ... the clinical genomics group at HL7 is looking for a good model to represent gene expression based biomarkers, i.e. a set of genes and one or more expression profiles for the set of genes where each profile maps to a phenotype or factor. altho the representation is more UMLish, this sounds like it would be helpful. > ... plus factor > values and genes and we'll have a student to work on this I hope would this also be able to include the originating BioSource annotations that aren't necessarily factor values? cheers, michael > -----Original Message----- > From: Helen Parkinson [mailto:parkinso@ebi.ac.uk] > Sent: Thursday, July 23, 2009 11:11 AM > To: Miller, Michael D (Rosetta) > Cc: Kei Cheung; HCLS; James Malone > Subject: Re: BioRDF Telcon > > Hi > > I meant to comment on this, I would not attempt a mage-ml->RDF > transform, I can probably do something more quickly with an > rdf export n > of transformed data analysed for over/under expressions plus factor > values and genes and we'll have a student to work on this I hope > > Helen > > Miller, Michael D (Rosetta) wrote: > > hi kei and helen, > > > > like helen, i've been following the HCLS working groups with great > > interest. as one of the designers, with helen, of the MAGE-ML and > > MAGE-TAB specs i might be able to provide a little technical insight > > into the formats. > > > > (from helen) > > "This is probably as we don't have data - here's a list of human > > experiments with the term neuron - if any of these are > useful, then I > > can prioritize their curation and inclusion in an atlas release" > > > > kei, are the NIH Neuroscience Microarry Consortium exeriments you've > > cited and others like them in GEO or ArrayExpress? a set > of those could > > be a good starting point for helen. > > > > first, MAGE-ML is based on a DTD[1], not an XSD. in early > 2002 as the > > OMG Gene Expression specification[1] was being finalized, > XSD was still > > in its infancy so we weren't comfortable at that point > generating a XSD. > > the MAGE-OM UML[2], in a very early XMI format from > Rational Rose and > > UniSys, was used to generate the DTD with code we wrote > ourselves[3]. > > > > the UML model was designed to capture the flow of a microarray > > experiment and how the resulting arrays were organized in > the experiment > > based on how the samples were treated and/or on the > samples' phenotypes > > for the purpose of a reviewer understanding the methodology > and for a > > researcher replicating and/or re-analyzing the results. > > > > some of the details of the flow may not be of much interest, i.e. it > > might be worth simply connecting the BioSource elements > with their gene > > expression data and not worrying about how the hybridization was > > performed. but that depends on what you want to do and you > know that > > better than i. > > > > also, the data itself are specified in external files, > typically in a > > white-space delimited format where the column headers are > specified in > > the MAGE-ML file in the QuantitationTypeDimension element and the > > identifiers of the row specified in one of the three > > DesignElementDimension elements, Feature, Reporter, > CompositeSequence, > > depending on how derived the data is. Also the data can be > in a vendor > > specific format such as the Affymetrix CEL (since the CEL file > > internally specifies the dimensions often they are left out of the > > MAGE-ML document). > > > > the ExperimentalFactor elements are certainly relevant and if you've > > looked at some of the examples you will noticed that the BioSource > > elements, in particular, and other elements are annotated by > > OntologyEntry elements. from the gene expression specification: > > > > "OntologyEntry > > A single entry from an ontology or a controlled vocabulary. For > > instance, category > > could be 'species name,' value could be 'homo sapiens' and ontology > > would be > > taxonomy database, NCBI." > > > > for the element an ontology entry element is annotating, we > looked at it > > as a way of specifying something like "the object identified by the > > element is an instance of the class/individual specified by the > > OntologyEntry" > > > > so from "kitm-affy-droso-176167" one sees that the BioSource is an > > "instance of" Drosophila, whole animal, whole head and an > age of 3 days: > > > > <BioSource > > identifier="arrayconsortium.tgen.org::biosource.181527" > name="Oregon R > > head 3d"> > > <Characteristics_assnlist> > > <OntologyEntry category="Organism" value="Drosophila" > > description="Drosophila"> > > <OntologyReference_assn> > > <DatabaseEntry accession="#Organism" > > > URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#O > rganism"> > > <Database_assnref> > > <Database_ref identifier="MO"/> > > </Database_assnref> > > </DatabaseEntry> > > <!-- snip --> > > </OntologyReference_assn> > > </OntologyEntry> > > <OntologyEntry category="OrganismPart" value="whole > > animal" description=""> > > <OntologyReference_assn> > > <DatabaseEntry accession="#OrganismPart" > > > URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#O > rganismPar > > t"> > > <Database_assnref> > > <Database_ref identifier="MO"/> > > </Database_assnref> > > </DatabaseEntry> > > </OntologyReference_assn> > > <!-- snip --> > > </OntologyEntry> > > <OntologyEntry category="OrganismPartRegion" > value="whole > > head" description=""> > > <!-- snip --> > > </OntologyEntry> > > <!-- snip --> > > <OntologyEntry category="Age" value="Age"> > > <OntologyReference_assn> > > <DatabaseEntry accession="#Age" > > URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#Age"> > > <Database_assnref> > > <Database_ref identifier="MO"/> > > </Database_assnref> > > </DatabaseEntry> > > </OntologyReference_assn> > > <Associations_assnlist> > > <OntologyEntry category="has_measurement" > > value="has_measurement"> > > <OntologyReference_assn> > > <DatabaseEntry > accession="#has_measurement" > > > URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#h > as_measure > > ment"> > > <Database_assnref> > > <Database_ref identifier="MO"/> > > </Database_assnref> > > </DatabaseEntry> > > </OntologyReference_assn> > > <Associations_assnlist> > > <OntologyEntry category="Measurement" > > value="Measurement"> > > <OntologyReference_assn> > > <DatabaseEntry > accession="#Measurement" > > > URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#M > easurement > > "> > > <Database_assnref> > > <Database_ref > identifier="MO"/> > > </Database_assnref> > > </DatabaseEntry> > > </OntologyReference_assn> > > <Associations_assnlist> > > <OntologyEntry category="has_value" > > value="has_value"> > > <OntologyReference_assn> > > <DatabaseEntry > > accession="#has_value" > > > URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#h > as_value"> > > <Database_assnref> > > <Database_ref > > identifier="MO"/> > > </Database_assnref> > > </DatabaseEntry> > > </OntologyReference_assn> > > <Associations_assnlist> > > <OntologyEntry > > category="has_value" value="3"/> > > </Associations_assnlist> > > </OntologyEntry> > > <OntologyEntry category="has_units" > > value="has_units"> > > <OntologyReference_assn> > > <DatabaseEntry > > accession="#has_units" > > > URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#h > as_units"> > > <Database_assnref> > > <Database_ref > > identifier="MO"/> > > </Database_assnref> > > </DatabaseEntry> > > </OntologyReference_assn> > > <Associations_assnlist> > > <OntologyEntry > > category="TimeUnit" value="days" description="24 hours, time unit"> > > <OntologyReference_assn> > > <DatabaseEntry > > accession="#days" > > URI="http://mged.sourceforge.net/ontologies/MGEDontology.php#days"> > > <Database_assnref> > > <Database_ref > > identifier="MO"/> > > </Database_assnref> > > </DatabaseEntry> > > </OntologyReference_assn> > > </OntologyEntry> > > </Associations_assnlist> > > </OntologyEntry> > > </Associations_assnlist> > > </OntologyEntry> > > </Associations_assnlist> > > </OntologyEntry> > > </Associations_assnlist> > > </OntologyEntry> > > <!-- snip --> > > </Characteristics_assnlist> > > <!-- snip --> > > </BioSource> > > > > by the by, the MAGE-ML examples i've looked at from the NIH > Neuroscience > > Microarry Consortium are not in a valid MAGE-ML.dtd format. > i'll send a > > follow-up e-mail dealing with the problems i see. they are > not far off > > but are invalid in a number of places. > > > > cheers, > > michael > > > > Michael Miller > > Lead Software Developer > > Rosetta Biosoftware Business Unit > > www.rosettabio.com > > > > [1] http://www.omg.org/spec/GENE/1.1/ > > > > (sadly, the original links to the MAGEstk appear to be broken, this > > mirror site still has the MAGE related files built up over > the years, > > here's my best guess as to the most helpful for the references) > > [2] > > > http://www.mirrorservice.org/sites/download.sourceforge.net/pu > b/sourcefo > > rge/m/mg/mged/ > > v1.0: > > > http://www.mirrorservice.org/sites/download.sourceforge.net/pu > b/sourcefo > > rge/m/mg/mged/MAGE-2002-01-07.xmi.gz/MAGE-2002-01-07.xmi > > v1.1: > > > http://www.mirrorservice.org/sites/download.sourceforge.net/pu > b/sourcefo > > rge/m/mg/mged/MAGE.xmi.gz[peek] > > [3] > > > http://www.mirrorservice.org/sites/download.sourceforge.net/pu > b/sourcefo > > rge/m/mg/mged/MAGE%20Java%20API/20010911/ > > > > > > > >> -----Original Message----- > >> From: public-semweb-lifesci-request@w3.org > >> [mailto:public-semweb-lifesci-request@w3.org] On Behalf Of > >> Helen Parkinson > >> Sent: Wednesday, July 22, 2009 2:55 AM > >> To: Kei Cheung > >> Cc: HCLS; James Malone > >> Subject: Re: BioRDF Telcon > >> > >> Responses in line. > >> > >> > >> > >>>> 1. We have text mined much of the Affymetrix GEO data, > >>>> > >> curated it and > >> > >>>> imported it into ArrayExpress - there is now much better sample > >>>> annotation than the native data in GEO. We also are > >>>> > >> running QC across > >> > >>>> all the data files so we know which should be excluded > for future > >>>> analyses. > >>>> > >>> I think it's the right thing to do both to enrich data > >>> > >> annotation and > >> > >>> to enhance data quality. This will help data integration a lot. > >>> > >>> Currently, we are exploring query federation in the neuroscience > >>> context. It'd be great if we can use the neuroscience use > >>> > >> case(s) to > >> > >>> help drive your ontology development for text mining and data > >>> visualization. In addition to the NIH neuroscience microarray > >>> consortium, it may be possible to collaborate with the > Neuroscience > >>> Information Framework (NIF) to see if we can utilize some of its > >>> resources (e.g., neuron ontology). > >>> > >> Re-use of the neuron ontology is possible, but it depends > on whether > >> there is available data to annotate either in ArrayExpress > or GEO. If > >> you can get me a list of experiments accessions or pubmed ids > >> I can see > >> if this is feasible > >> > >>>> 3. We have summary level data of genes x conditions for > >>>> > >> ~30,000 hybs > >> > >>>> worth of data in our gene expression atlas with p values > >>>> > >> indicating > >> > >>>> relative under/over-expression. We are planning to > export these as > >>>> triples as soon as we publish the atlas - these may be of > >>>> > >> interest. > >> > >>>> www.ebi.ac.uk/gxa - there's an API at present, but it will be > >>>> improved in the next month or so. > >>>> > >>> It fits well with what we're currently exploring in terms > >>> > >> of gene list > >> > >>> representation and linking genes and samples to existing > >>> > >> ontologies. > >> > >>> It'd be great if we can download or fetch RDF triples from > >>> > >> EBI atlas. > >> We have a student starting work on this in a month, if you > >> can produce > >> concrete use cases for how you want to access these data we can do > >> something. > >> > >>>> 4. If neuroscience data is of specific interest we could > >>>> > >> do a themed > >> > >>>> atlas release where we add datasets for a given community > >>>> > >> or project > >> > >>>> and make these available. These can be identified by > >>>> > >> ArrayExpress or > >> > >>>> GEO accession or pubmed and we can re-annotate the genes vs > >>>> Uniprot/Ensembl, add GO terms, etc and curate the sample > >>>> > >> attributes > >> > >>>> and experimental variables. These pipelines are already in > >>>> > >> place as > >> > >>>> part of our production workflow. > >>>> > >>> I think it's a great idea to do a themed atlas (e.g., > >>> > >> neuro-atlas). I > >> > >>> just played with gxa a little bit. It's nice! For > example, I could > >>> find genes that are over-expressed in the hippocampus > brain region > >>> across different experiments. However, when I tried to do > the same > >>> thing for neurons, there are only a few neuron types that I can > >>> select. It'd be nice if we can have more neuron types, > for instance. > >>> > >> This is probably as we don't have data - here's a list of human > >> experiments with the term neuron - if any of these are > useful, then I > >> can prioritise their curation and inclusion in an atlas release > >> > >> > >> http://www.ebi.ac.uk/microarray-as/ae/browse.html?keywords=neu > >> > > ron&species=Homo+sapiens&array=&exptype=&pagesize=25> > > &sortby=releasedate&sortorder=descending > > > >> and brain > >> > >> http://www.ebi.ac.uk/microarray-as/ae/browse.html?keywords=bra > >> > > in&species=Homo+sapiens&array=&exptype=&pagesize=25> > > &sortby=releasedate&sortorder=descending > > > >>>> I'd be very happy to collaborate, and for this group to > >>>> > >> use our data, > >> > >>>> we spend a lot of time adding semantic value to it, so > >>>> > >> please let me > >> > >>>> know if this is of interest > >>>> > >>> We are also looking into the possibility of establishing > >>> > >> collaboration > >> > >>> with the scientific discourse task force based on the > >>> > >> microarray use > >> > >>> case. We're planning to have a microarray-related > presentation and > >>> discussion on Aug. 31 (Monday, 11 am EDT/5 pm CET). > Details will be > >>> announced later. It'd be great if you can join the BioRDF call to > >>> participate in the discussion. > >>> > >>> Cheers, > >>> > >>> -Kei > >>> > >>>> best regards > >>>> > >>>> Helen > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> Kei Cheung wrote: > >>>> > >>>>> The minutes for yesterday's BioRDF call are available at: > >>>>> > >>>>> > >>>>> > > > http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Meetings/2009-0 > 7-20_Confe > > rence_Call > > > >>>>> Thanks to Lena for scribing and Eric for retrieving the > >>>>> > >> transcript > >> > >>>>> from the IRC log. > >>>>> > >>>>> Cheers, > >>>>> > >>>>> -Kei > >>>>> > >>>>> Kei Cheung wrote: > >>>>> > >>>>>> This is a reminder that the next BioRDF teleconf. will > >>>>>> > >> be held at > >> > >>>>>> 11 am EDT (5 pm CET) on Monday, July 20 (see details below). > >>>>>> > >>>>>> I created the following wiki page for discussing the > >>>>>> > >> microarray use > >> > >>>>>> case: > >>>>>> > >>>>>> http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/QueryFederation2 > >>>>>> > >>>>>> Cheers, > >>>>>> > >>>>>> -Kei > >>>>>> > >>>>>> == Conference Details == > >>>>>> * Date of Call: Monday July 20, 2009 > >>>>>> * Time of Call: 11:00 am Eastern Time > >>>>>> * Dial-In #: +1.617.761.6200 (Cambridge, MA) > >>>>>> * Dial-In #: +33.4.89.06.34.99 (Nice, France) > >>>>>> * Dial-In #: +44.117.370.6152 (Bristol, UK) > >>>>>> * Participant Access Code: 4257 ("HCLS") > >>>>>> * IRC Channel: irc.w3.org port 6665 channel #hcls (see > >>>>>> [http://www.w3.org/Project/IRC/ W3C IRC page] for > >>>>>> > >> details, or see > >> > >>>>>> [http://cgi.w3.org/member-bin/irc/irc.cgi Web IRC]) > >>>>>> * Duration: ~1 hour > >>>>>> * Frequency: bi-weekly > >>>>>> * Convener: Kei Cheung > >>>>>> > >>>>>> == Agenda == > >>>>>> * Roll call and introduction (Kei) > >>>>>> * TCM data quick update (Jun, Kei) > >>>>>> * Query federation use case expanison (microarray) (All) > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> > >> -- > >> Helen Parkinson, PhD > >> ArrayExpress Production Coordinator, > >> Microarray Informatics Team, > >> EBI > >> > >> EBI 01223 494672 > >> Skype: helen.parkinson.ebi > >> > >> > >> > >> >
Received on Thursday, 23 July 2009 19:12:08 UTC