- From: kc28 <kei.cheung@yale.edu>
- Date: Sat, 09 Sep 2006 22:22:30 -0400
- To: "Miller, Michael D (Rosetta)" <Michael_Miller@Rosettabio.com>
- Cc: William Bug <William.Bug@DrexelMed.edu>, Alan Ruttenberg <alanruttenberg@gmail.com>, Marco Brandizi <brandizi@ebi.ac.uk>, semantic-web@w3.org, public-semweb-lifesci@w3.org
Hi Michael et al, The following tools, for example, are available for microarray gene annotation. SOURCE -- http://nar.oxfordjournals.org/cgi/content/full/31/1/219 KARMA -- http://nar.oxfordjournals.org/cgi/content/full/32/suppl_2/W441 RESOURCERER -- http://pga.tigr.org/tigr-scripts/magic/r1.pl DRAGON -- http://pevsnerlab.kennedykrieger.org/dragon.htm These tools take a gene list of interest and return annotation collected from multiple sources (e.g., gene ontology, UniProt, and KEGG). It might be useful if these tools can be made semantic-web-aware. Cheers, -Kei Miller, Michael D (Rosetta) wrote: > Hi Bill and Allan, > > You misunderstand my use case. > > My researcher doesn't much care that the world knows about his/her > microarray experiment yet--in fact he/she may very well be searching > for interesting information about the gene set to see whether it is > worth going further or whether the experiment was just retreading old > ground or whatever. There's this new tool, the semantic web, so the > researcher is going to submit this set of genes and hopefully get > useful information on them as a set. > > Now this researcher probably makes the assumption that as long as the > naming source of the genes is indicated, no further work is required. > This naming source may very well be GenBank, which, of course, isn't > likely to be set up for easy access for pure semantic web tools for > many years, if ever, as many people on this list would like but better > be supported by the semantic web because for all its faults, and all > the faults of the current sequence databases, if the semantic web > can't garner information from them I don't see much hope for adoption > from the common researcher. > > So perhaps the researcher gets back that the genes are part of a > particular pathway, there were a few papers in PubMed that mentions > them, some microarray experiments in public repositories had them > significantly up or down regulated or effected by some drug but the > conclusions for this experiment appear to be worthy so hopefully the > experiment will get annotated (with these semantic web results as > well), be made part of a submittal, and the experiment itself > deposited in a public database to now be accessible for others > searching the semantic web. > > "In translating the instance data into OWL, it should then be possible > to perform the sort of higher level sorting and re-analysis Alan > describes." > > Although this wasn't the use case I was talking about in this thread, > it is obviously a very interesting use case also. I believe in an > earlier e-mail I talked about something similar, but one will be > unlikely to find out, in general, about the individuals in the > experiment (outside of genes), because they will be truly unique > instances of things like samples, hybridization and feature extraction > and data but, if the researcher annotates these individuals from rich > ontologies or from not so great sources that tools are developed to > compensate for, then I agree entirely with you and Alan that much > useful reasoning can be done on the semantic web. > > "The tendency when presenting these results in research articles - and > often when sharing the data - is to provide the analyzed/reduced view > of the data" > > Actually, I just heard Leroy Hood of ISB and other fame and Eric > Schadt for Rosetta Inpharmatics (our parent company) give excellent > talks at the MGED9 meeting where their research is opening up and > bringing in information and data from a vast number of resources and > tying it together into big pictures, all without the semantic web. > I'm sure they would love to have the kind of power envisioned by the > W3C for the semantic web but they won't touch it until it is > easy--they are busy doing their core jobs. > > So I really think that we need to: > > 1) make sure the semantic web allows people to poke at it, I.e. ask > the question is there anything interesting about a particular object, > without having to say why they are interested > > 2) provide tools so that they can annotate their objects well so that > when they are submitted they can be incorporated into the web (moving > forward, this is one aim of the MAGEstk for gene expression experiments) > > 3) provide that existing imperfect resources have semantic web tools > that can overcome those imperfections and get the usefulness from them > people are currently getting > > 4) most importantly get a useful semantic web out there now, there's > plenty of information available, then make it better as time goes along > > The resources that are ready set up for easy integration into the > semantic web will come along for free. > > cheers, > Michael > > > -----Original Message----- > *From:* William Bug [mailto:William.Bug@DrexelMed.edu] > *Sent:* Friday, September 08, 2006 8:39 PM > *To:* Alan Ruttenberg > *Cc:* Miller, Michael D (Rosetta); Marco Brandizi; > semantic-web@w3.org; public-semweb-lifesci@w3.org > *Subject:* Re: Playing with sets in OWL... > > I think Alan is making a very important general point here. > > MAGE-ML/MAGE-OM is perfectly tuned to the needs of: > a) transferring entire microarray data sets across systems > b) persisting microarray data sets (at least in certain scenarios) > c) providing a systematic, normative interface for writing code to > access specific elements and data collections one typically finds > in the description of a microarray data set > This is the sort of functionality data models are particularly > well suited at supporting. > > MAGE-OM/MAGE-ML is also the result of a huge amount of > deliberation from dozens of experts in the informatics fields > involved in generating, storing, and manipulating microarray data. > > When it comes to manipulating the information associated with a > microarray experiment - or collection of experiments - in a > semantically explicitly manner, however, RDF is really the > preferred formalism providing the required explicit semantics, > while still providing the expressiveness needed to characterize > the inherent variety, complexity, and granularity in this > information. When it comes to filling out the assertions to the > point of being able to reason on them - even simple reasoning such > as consistency checks - some dialect of OWL will be the formalism > of choice, I believe. > > I think Alan gives a very clear example of how to use OWL in this > particular situation described by Marco. > > I have just a few questions in followup: > 1) The MAGE-ML XML Schema provides for a great deal of flexibility > via the use of optional fields. Still, any given use in a > specific lab for a specific collection of microarray experiments > is likely to develop it's own conventions for which fields to use > and which not to use - and how to populate the more "open" > elements. With this in mind, it seems it should be possible under > those circumstances to create an XSLT to translate the individuals > contained in a MAGE-ML instance according to the elemental OWL > classes Alan described - > Expression_technology, Expression_technology_map, Spot_mapping, > Expression_profile_experiment, Spot_intensity, > Gene_expression_computation. The latter can probably be > reconstituted from the MAGE-ML elements BioAssay, BioAssayData, > HigherLevelAnalysis, Measurement, and QuantitationType. In > translating the instance data into OWL, it should then be possible > to perform the sort of higher level sorting and re-analysis Alan > describes. The translation should probably take the "open world" > assumption into account, so the resulting OWL statements will > provide the intended semantic completeness, even if that isn't > represented in the MAGE-ML instances themselves. > > 2) I think the use of OWL Alan describes here is going to be > critical to performing broad field, large scale re-analysis of > complex data sets such as microarray experiments and various types > of neuro-images containing segmented geometric objects (in many > ways equivalent to the segmentation performed on microarray images > to determine the location and intensity of spots). The tendency > when presenting these results in research articles - and often > when sharing the data - is to provide the analyzed/reduced view of > the data. In the context of these complex experiments, many forms > of re-analysis will not be possible without access to the > originally collected data. Think of how critical BLAST-based > meta-analysis was for GeneBank through the 1990s (and still is). > There are several underlying assertions making it possible to > perform such analysis. Primary among them is the acceptance that > each form of sequencing technology provides a reliable way of > determining the probability of finding a particular nucleotide at > a particular location. Many sequences are submitted with the > simple assertion that at position N in sequence X there is a 100% > probability (or 95% confidence, to be more specific) of finding > nucleotide A|T|G|C. To some extent, the statistical analysis > performed by BLAST (and other position-sensitive, > cross-correlative statistical algorithms) relied on these "ground > facts". For the most part, it was safe to assume this level of > reduced data could be safely pooled with other such sequence > determinations regardless of the specific sequencing device, > underlying biochemical protocols, and specific lots of reagents > used. These same assumptions can not generally be safely assumed > for microarray experiments, segmented MRI images - and many other > types of images such as IHC or in situ based images. As an > example, just look to the debates in the last year or two > regarding the sometimes problematic nature of replicating "gene > expression" level results with different arrays covering the > "same" genes. If we are to support the same sort of meta analysis > as was common with BLAST across GenBank sequences, then we will > have to often supply access to the low level data elements. This > in fact was a major impetus behind providing the MAGE-OM (and > FuGE-OM). As I state at the top of this email with points 'a', > 'b', & 'c', MAGE-OM/MAGE-ML is extremely useful for several > critical tasks related to the handling of this detailed data. > When it comes to supporting the semantically-grounded analytical > requirements of such complex, broad field, meta-analysis, however, > I think OWL (and sometimes RDF alone) is going to prove a critical > enabling technology. > > 3) Re:anonymous classes/individuals of the type Alan describes: > These are essentially "blank nodes" in the RDF sense - "unnamed" > nodes based on a collection of necessary restrictions, if I > understand things correctly. Please pardon the naive question, > but aren't there some caveats in terms of processing very large > RDF and/or OWL graphs containing "blank" or "anonymous" nodes. > For many OWL ontologies, this might not be a concern, but if one > were to be tempted to express a large variety of such sets based > on different groupings of the sequence probes on a collection of > arrays - groupings relevant to specific types of analysis - I > could see how these anonymous entities - especially the anonymous > sets of individuals - could really proliferate. > > Many thanks for providing this very helpful exemplar, Alan. > > Cheers, > Bill > > > On Sep 8, 2006, at 9:50 PM, Alan Ruttenberg wrote: > >> >> Yes. However I don't think I would change anything I wrote. >> Because OWL works in the open world, we can say that all these >> things exists, but only supply the details that we need. But >> having the framework which explains the meaning of what is >> supplied is one of the points of using ontologies. In this case, >> if all we know is that there was some computation that led to >> this gene set we could use some arbitrary name for it >> (remembering that if we decided to represent it later/ merge it >> with the experimental run we can use owl:sameAs to merge our name >> with the actual name). >> >> So. with reference to this ontology (generated by Marco, or >> imported from some standard) he could simply state: >> >> Individual(c1 type(Computation) >> value(geneComputedAsExpressed g1) >> value(geneComputedAsExpressed g2) >> value(geneComputedAsExpressed g3) >> ) >> >> If he wanted to state that the source was an array experiment >> (but he didn't know the details), he could add to c1 >> >> value(fromExperiment Individual( >> type(ExpressionProfileExperiment))) >> >> which uses an anonymous individual (blank node) of the >> appropriate type. Now you know that the data originally came from >> an expression profile experiment, though you haven't needed to >> add any other information other than that. >> >> The pattern that Marco mentions that is closest to this is >> >>>>> set1 isA GeneSet >>>>> set1 hasMember g1, g2, g3 >>>> >> >> in that we are using the property values on an instance to >> represent the set. But the point I wanted to make was that a gene >> set isn't some arbitrary set. It is a choice, chosen for a >> reason/purpose, and that the ontology should explicitly represent >> those reasons/purposes. >> >> If there are defined kinds of follow up, then he could define >> define an instance to represent that process too. >> >> Finally, I wanted to make the technical point that that he >> doesn't need to use constructs of the form: >> >>>>> set1 derivesFromUnionOf set2, set3 >>>> >> >> OWL provides the ability to say these things, even when the "set" >> is the property values of an instance, for example, given >> >> Individual(c1 type(Computation) >> value(geneComputedAsExpressed g1) >> ) >> >> Individual(c2 type(Computation) >> value(geneComputedAsExpressed g2) >> value(geneComputedAsExpressed g3) >> ) >> >> supposing that he wanted to represent a followup list to be >> verified by RT PCR represented by the class RTPCRFollowup. >> Let's say that wanted to call the property geneToFollowUp, with >> inverse geneFollowedUpIn >> >> Individual(RTPCRFollowup1 type(RTPCRFollowup)) >> >> EquivalentClasses( >> unionOf( >> restriction(GeneExpressedAccordingTo hasValue(c1)) >> restriction(GeneExpressedAccordingTo hasValue(c2))) >> restriction(geneFollowedUpIn hasValue(RTPCRFollowup1)))) >> >> Now, e.g. Pellet, will conclude that the values of the property >> geneToFollowUp of instance RTPCRFollowup1 is exactly g1, g2, g3 >> >> Of course that's not the only way to do it, but it does show that >> OWL reasoning can make it economical to represent and work with >> sets without having to go off and recapitulate set theory. >> >> -Alan >> >> On Sep 8, 2006, at 7:41 PM, Miller, Michael D (Rosetta) wrote: >> >>> >>> Hi Alan, >>> >>> What you are describing is described in MAGE-OM/MAGE-ML, as a >>> UML model >>> to capture the real world aspects of running a microarray >>> experiment. >>> >>> Typically at the end of this process a set of genes is identified as >>> being interesting for some reason and one wants to know more >>> about this >>> set of genes beyond the microarray experiment that has been >>> performed. >>> >>> I might be wrong but I think that is where Marco is starting, at >>> the end >>> of the experiment for follow-up. >>> >>> cheers, >>> Michael >>> >>>> -----Original Message----- >>>> From: public-semweb-lifesci-request@w3.org >>>> <mailto:public-semweb-lifesci-request@w3.org> >>>> [mailto:public-semweb-lifesci-request@w3.org] On Behalf Of >>>> Alan Ruttenberg >>>> Sent: Friday, September 08, 2006 3:07 PM >>>> To: Marco Brandizi >>>> Cc: semantic-web@w3.org <mailto:semantic-web@w3.org>; >>>> public-semweb-lifesci@w3.org <mailto:public-semweb-lifesci@w3.org> >>>> Subject: Re: Playing with sets in OWL... >>>> >>>> >>>> >>>> Hi Marco, >>>> >>>> There are a number of ways to work with sets, but I don't think I'd >>>> approach this problem from that point of view. >>>> Rather, I would start by thinking about what my domain instances >>>> are, what their properties are, and what kinds of questions I >>>> want to >>>> be able to ask based on the representation. I'll sketch this out a >>>> bit, though the fact that I name an object or property doesn't mean >>>> that you have to supply it (remember OWL is open-world) - still >>>> listing these make the ontology makes your intentions clearer and >>>> the ontology easier to work with by others. >>>> >>>> The heading in each of these is a class, of which you would >>>> make one >>>> or more instances to represent your results. >>>> The indented names are properties on instances of that class. >>>> >>>> An expression technology: >>>> Vendor: >>>> Product: e.g. array name >>>> Name of spots on the array >>>> Mappings: (maps of spot to gene - you might use e.g. >>>> affymetrix, >>>> or you might compute your own) >>>> >>>> ExpressionTechnologyMap >>>> SpotMapping: (each value a spot mapping) >>>> >>>> Spot mapping: >>>> SpotID: >>>> GeneID: >>>> >>>> An expression profile experiment (call yours exp0) >>>> When done: >>>> Who did it: >>>> What technology was used: (an expression technology) >>>> Sample: (a sample) >>>> Treatment: ... >>>> Levels: A bunch of pairs of spot name, intensity >>>> >>>> Spot intensity >>>> SpotID: >>>> Intensity: >>>> >>>> A computation of which spots/genes are "expressed" (call yours c1) >>>> Name of the method : e.g. mas5 above threshold >>>> Parameter of the method: e.g. the threshold >>>> Experiment: exp0 >>>> Spot Expressed: spots that were over threshold >>>> Gene Computed As Expressed: genes that were over threshold >>>> >>>> And maybe: >>>> >>>> Conclusion >>>> What was concluded: >>>> By who: >>>> Based on: c1 >>>> >>>> All of what you enter for your experiment are instances (so >>>> there are >>>> no issues of OWL Full) >>>> >>>> Now, The gene set you wanted can be expressed as a class: >>>> >>>> Let's define an inverse property of >>>> "GeneComputedAsExpressed", call >>>> it "GeneExpressedAccordingTo" >>>> >>>> Class(Set1 partial restriction(GeneExpressedAccordingTo >>>> hasValue(c1)) >>>> >>>> Instances of Set1 will be those genes. You may or may not want to >>>> actually define this class. However I don't think that youneed >>>> to add any properties to it. Everything you would want to say >>>> probably wants to be said on one of the instances - the experiment, >>>> the computation, the conclusion, etc. >>>> >>>> Let me know if this helps/hurts - glad to discuss this some more >>>> >>>> -Alan >>>> >>>> >>>> >>>> >>>> 2) >>>> >>>> On Sep 8, 2006, at 11:58 AM, Marco Brandizi wrote: >>>> >>>>> >>>>> Hi all, >>>>> >>>>> sorry for the possible triviality of my questions, or the >>>> >>>> messed-up >>>> >>>>> mind >>>>> I am possibly showing... >>>>> >>>>> I am trying to model the grouping of individuals into sets. In my >>>>> application domain, the gene expression, people put >>>> >>>> together, let's >>>> >>>>> say >>>>> genes, associating a meaning to the sets. >>>>> >>>>> For instance: >>>>> >>>>> Set1 := { gene1, gene2, gene3 } >>>>> >>>>> is the set of genes that are expressed in experiment0 >>>>> >>>>> (genei and exp0 are OWL individuals) >>>>> >>>>> >>>>> I am understanding that this may be formalized in OWL by: >>>>> >>>>> - declaring Set1 as owl:subClassOf Gene >>>>> - using oneOf to declare the membership of g1,2,3 >>>>> (or simpler: (g1 type Set1), (g2 type Set1), etc. ) >>>>> - using hasValue with expressed and exp0 >>>>> >>>>> (right?) >>>>> >>>>> Now, I am trying to build an application which is like a semantic >>>>> wiki. >>>>> >>>>> Hence users have a quite direct contact with the underline >>>>> ontology, and >>>>> they can write, with a simplified syntax, statements about a >>>>> subject >>>>> they are describing (subject-centric approach). >>>>> >>>>> Commiting to the very formal formalism of OWL looks a bit >>>> >>>> too much... >>>> >>>>> formal... ;-) and hard to be handled with a semantic wiki-like >>>>> application. >>>>> >>>>> Another problem is that the set could have properties on >>>> >>>> its own, for >>>> >>>>> instance: >>>>> >>>>> Set1 hasAuthor Jhon >>>>> >>>>> meaning that John is defining it. But hasAuthor is >>>> >>>> typically used for >>>> >>>>> individuals, and I wouldn't like to fall in OWL-Full, by >>>> >>>> making an OWL >>>> >>>>> reasoner to interpret Set1 both as an individual and a class. >>>>> >>>>> Aren't there more informal (although less precise) methods to >>>>> model >>>>> sets, or list of individuals? >>>>> >>>>> An approach could be modeling some sort of set-theory over >>>>> individuals: >>>>> >>>>> set1 isA GeneSet >>>>> set1 hasMember g1, g2, g3 >>>>> ... >>>>> >>>>> set1 derivesFromUnionOf set2, set3 >>>>> >>>>> ... >>>>> >>>>> But I am not sure it would be a good approach, or if someone else >>>>> already tried that. >>>>> >>>>> Any suggestion? >>>>> >>>>> >>>>> Thanks in advance for a reply. >>>>> >>>>> Cheers. >>>>> >>>>> -- >>>>> >>>>> >>>> ============================================================== >>>> ======== >>>> >>>>> ========= >>>>> Marco Brandizi <brandizi@ebi.ac.uk <mailto:brandizi@ebi.ac.uk>> >>>>> http://gca.btbs.unimib.it/brandizi >>>>> >>>>> >>>> >>>> >>>> >>>> >>> >>> >> >> > > Bill Bug > Senior Research Analyst/Ontological Engineer > > Laboratory for Bioimaging & Anatomical Informatics > www.neuroterrain.org > Department of Neurobiology & Anatomy > Drexel University College of Medicine > 2900 Queen Lane > Philadelphia, PA 19129 > 215 991 8430 (ph) > 610 457 0443 (mobile) > 215 843 9367 (fax) > > > Please Note: I now have a new email - William.Bug@DrexelMed.edu > <mailto:William.Bug@DrexelMed.edu> > > > > >This email and any accompanying attachments are confidential. >This information is intended solely for the use of the individual >to whom it is addressed. Any review, disclosure, copying, >distribution, or use of this email communication by others is strictly >prohibited. If you are not the intended recipient please notify us >immediately by returning this message to the sender and delete >all copies. Thank you for your cooperation. >
Received on Sunday, 10 September 2006 02:26:28 UTC