- From: kc28 <kei.cheung@yale.edu>
- Date: Sat, 09 Sep 2006 22:22:30 -0400
- To: "Miller, Michael D (Rosetta)" <Michael_Miller@Rosettabio.com>
- Cc: William Bug <William.Bug@DrexelMed.edu>, Alan Ruttenberg <alanruttenberg@gmail.com>, Marco Brandizi <brandizi@ebi.ac.uk>, semantic-web@w3.org, public-semweb-lifesci@w3.org
Hi Michael et al,
The following tools, for example, are available for microarray gene
annotation.
SOURCE -- http://nar.oxfordjournals.org/cgi/content/full/31/1/219
KARMA -- http://nar.oxfordjournals.org/cgi/content/full/32/suppl_2/W441
RESOURCERER -- http://pga.tigr.org/tigr-scripts/magic/r1.pl
DRAGON -- http://pevsnerlab.kennedykrieger.org/dragon.htm
These tools take a gene list of interest and return annotation collected
from multiple sources (e.g., gene ontology, UniProt, and KEGG). It might
be useful if these tools can be made semantic-web-aware.
Cheers,
-Kei
Miller, Michael D (Rosetta) wrote:
> Hi Bill and Allan,
>
> You misunderstand my use case.
>
> My researcher doesn't much care that the world knows about his/her
> microarray experiment yet--in fact he/she may very well be searching
> for interesting information about the gene set to see whether it is
> worth going further or whether the experiment was just retreading old
> ground or whatever. There's this new tool, the semantic web, so the
> researcher is going to submit this set of genes and hopefully get
> useful information on them as a set.
>
> Now this researcher probably makes the assumption that as long as the
> naming source of the genes is indicated, no further work is required.
> This naming source may very well be GenBank, which, of course, isn't
> likely to be set up for easy access for pure semantic web tools for
> many years, if ever, as many people on this list would like but better
> be supported by the semantic web because for all its faults, and all
> the faults of the current sequence databases, if the semantic web
> can't garner information from them I don't see much hope for adoption
> from the common researcher.
>
> So perhaps the researcher gets back that the genes are part of a
> particular pathway, there were a few papers in PubMed that mentions
> them, some microarray experiments in public repositories had them
> significantly up or down regulated or effected by some drug but the
> conclusions for this experiment appear to be worthy so hopefully the
> experiment will get annotated (with these semantic web results as
> well), be made part of a submittal, and the experiment itself
> deposited in a public database to now be accessible for others
> searching the semantic web.
>
> "In translating the instance data into OWL, it should then be possible
> to perform the sort of higher level sorting and re-analysis Alan
> describes."
>
> Although this wasn't the use case I was talking about in this thread,
> it is obviously a very interesting use case also. I believe in an
> earlier e-mail I talked about something similar, but one will be
> unlikely to find out, in general, about the individuals in the
> experiment (outside of genes), because they will be truly unique
> instances of things like samples, hybridization and feature extraction
> and data but, if the researcher annotates these individuals from rich
> ontologies or from not so great sources that tools are developed to
> compensate for, then I agree entirely with you and Alan that much
> useful reasoning can be done on the semantic web.
>
> "The tendency when presenting these results in research articles - and
> often when sharing the data - is to provide the analyzed/reduced view
> of the data"
>
> Actually, I just heard Leroy Hood of ISB and other fame and Eric
> Schadt for Rosetta Inpharmatics (our parent company) give excellent
> talks at the MGED9 meeting where their research is opening up and
> bringing in information and data from a vast number of resources and
> tying it together into big pictures, all without the semantic web.
> I'm sure they would love to have the kind of power envisioned by the
> W3C for the semantic web but they won't touch it until it is
> easy--they are busy doing their core jobs.
>
> So I really think that we need to:
>
> 1) make sure the semantic web allows people to poke at it, I.e. ask
> the question is there anything interesting about a particular object,
> without having to say why they are interested
>
> 2) provide tools so that they can annotate their objects well so that
> when they are submitted they can be incorporated into the web (moving
> forward, this is one aim of the MAGEstk for gene expression experiments)
>
> 3) provide that existing imperfect resources have semantic web tools
> that can overcome those imperfections and get the usefulness from them
> people are currently getting
>
> 4) most importantly get a useful semantic web out there now, there's
> plenty of information available, then make it better as time goes along
>
> The resources that are ready set up for easy integration into the
> semantic web will come along for free.
>
> cheers,
> Michael
>
>
> -----Original Message-----
> *From:* William Bug [mailto:William.Bug@DrexelMed.edu]
> *Sent:* Friday, September 08, 2006 8:39 PM
> *To:* Alan Ruttenberg
> *Cc:* Miller, Michael D (Rosetta); Marco Brandizi;
> semantic-web@w3.org; public-semweb-lifesci@w3.org
> *Subject:* Re: Playing with sets in OWL...
>
> I think Alan is making a very important general point here.
>
> MAGE-ML/MAGE-OM is perfectly tuned to the needs of:
> a) transferring entire microarray data sets across systems
> b) persisting microarray data sets (at least in certain scenarios)
> c) providing a systematic, normative interface for writing code to
> access specific elements and data collections one typically finds
> in the description of a microarray data set
> This is the sort of functionality data models are particularly
> well suited at supporting.
>
> MAGE-OM/MAGE-ML is also the result of a huge amount of
> deliberation from dozens of experts in the informatics fields
> involved in generating, storing, and manipulating microarray data.
>
> When it comes to manipulating the information associated with a
> microarray experiment - or collection of experiments - in a
> semantically explicitly manner, however, RDF is really the
> preferred formalism providing the required explicit semantics,
> while still providing the expressiveness needed to characterize
> the inherent variety, complexity, and granularity in this
> information. When it comes to filling out the assertions to the
> point of being able to reason on them - even simple reasoning such
> as consistency checks - some dialect of OWL will be the formalism
> of choice, I believe.
>
> I think Alan gives a very clear example of how to use OWL in this
> particular situation described by Marco.
>
> I have just a few questions in followup:
> 1) The MAGE-ML XML Schema provides for a great deal of flexibility
> via the use of optional fields. Still, any given use in a
> specific lab for a specific collection of microarray experiments
> is likely to develop it's own conventions for which fields to use
> and which not to use - and how to populate the more "open"
> elements. With this in mind, it seems it should be possible under
> those circumstances to create an XSLT to translate the individuals
> contained in a MAGE-ML instance according to the elemental OWL
> classes Alan described -
> Expression_technology, Expression_technology_map, Spot_mapping,
> Expression_profile_experiment, Spot_intensity,
> Gene_expression_computation. The latter can probably be
> reconstituted from the MAGE-ML elements BioAssay, BioAssayData,
> HigherLevelAnalysis, Measurement, and QuantitationType. In
> translating the instance data into OWL, it should then be possible
> to perform the sort of higher level sorting and re-analysis Alan
> describes. The translation should probably take the "open world"
> assumption into account, so the resulting OWL statements will
> provide the intended semantic completeness, even if that isn't
> represented in the MAGE-ML instances themselves.
>
> 2) I think the use of OWL Alan describes here is going to be
> critical to performing broad field, large scale re-analysis of
> complex data sets such as microarray experiments and various types
> of neuro-images containing segmented geometric objects (in many
> ways equivalent to the segmentation performed on microarray images
> to determine the location and intensity of spots). The tendency
> when presenting these results in research articles - and often
> when sharing the data - is to provide the analyzed/reduced view of
> the data. In the context of these complex experiments, many forms
> of re-analysis will not be possible without access to the
> originally collected data. Think of how critical BLAST-based
> meta-analysis was for GeneBank through the 1990s (and still is).
> There are several underlying assertions making it possible to
> perform such analysis. Primary among them is the acceptance that
> each form of sequencing technology provides a reliable way of
> determining the probability of finding a particular nucleotide at
> a particular location. Many sequences are submitted with the
> simple assertion that at position N in sequence X there is a 100%
> probability (or 95% confidence, to be more specific) of finding
> nucleotide A|T|G|C. To some extent, the statistical analysis
> performed by BLAST (and other position-sensitive,
> cross-correlative statistical algorithms) relied on these "ground
> facts". For the most part, it was safe to assume this level of
> reduced data could be safely pooled with other such sequence
> determinations regardless of the specific sequencing device,
> underlying biochemical protocols, and specific lots of reagents
> used. These same assumptions can not generally be safely assumed
> for microarray experiments, segmented MRI images - and many other
> types of images such as IHC or in situ based images. As an
> example, just look to the debates in the last year or two
> regarding the sometimes problematic nature of replicating "gene
> expression" level results with different arrays covering the
> "same" genes. If we are to support the same sort of meta analysis
> as was common with BLAST across GenBank sequences, then we will
> have to often supply access to the low level data elements. This
> in fact was a major impetus behind providing the MAGE-OM (and
> FuGE-OM). As I state at the top of this email with points 'a',
> 'b', & 'c', MAGE-OM/MAGE-ML is extremely useful for several
> critical tasks related to the handling of this detailed data.
> When it comes to supporting the semantically-grounded analytical
> requirements of such complex, broad field, meta-analysis, however,
> I think OWL (and sometimes RDF alone) is going to prove a critical
> enabling technology.
>
> 3) Re:anonymous classes/individuals of the type Alan describes:
> These are essentially "blank nodes" in the RDF sense - "unnamed"
> nodes based on a collection of necessary restrictions, if I
> understand things correctly. Please pardon the naive question,
> but aren't there some caveats in terms of processing very large
> RDF and/or OWL graphs containing "blank" or "anonymous" nodes.
> For many OWL ontologies, this might not be a concern, but if one
> were to be tempted to express a large variety of such sets based
> on different groupings of the sequence probes on a collection of
> arrays - groupings relevant to specific types of analysis - I
> could see how these anonymous entities - especially the anonymous
> sets of individuals - could really proliferate.
>
> Many thanks for providing this very helpful exemplar, Alan.
>
> Cheers,
> Bill
>
>
> On Sep 8, 2006, at 9:50 PM, Alan Ruttenberg wrote:
>
>>
>> Yes. However I don't think I would change anything I wrote.
>> Because OWL works in the open world, we can say that all these
>> things exists, but only supply the details that we need. But
>> having the framework which explains the meaning of what is
>> supplied is one of the points of using ontologies. In this case,
>> if all we know is that there was some computation that led to
>> this gene set we could use some arbitrary name for it
>> (remembering that if we decided to represent it later/ merge it
>> with the experimental run we can use owl:sameAs to merge our name
>> with the actual name).
>>
>> So. with reference to this ontology (generated by Marco, or
>> imported from some standard) he could simply state:
>>
>> Individual(c1 type(Computation)
>> value(geneComputedAsExpressed g1)
>> value(geneComputedAsExpressed g2)
>> value(geneComputedAsExpressed g3)
>> )
>>
>> If he wanted to state that the source was an array experiment
>> (but he didn't know the details), he could add to c1
>>
>> value(fromExperiment Individual(
>> type(ExpressionProfileExperiment)))
>>
>> which uses an anonymous individual (blank node) of the
>> appropriate type. Now you know that the data originally came from
>> an expression profile experiment, though you haven't needed to
>> add any other information other than that.
>>
>> The pattern that Marco mentions that is closest to this is
>>
>>>>> set1 isA GeneSet
>>>>> set1 hasMember g1, g2, g3
>>>>
>>
>> in that we are using the property values on an instance to
>> represent the set. But the point I wanted to make was that a gene
>> set isn't some arbitrary set. It is a choice, chosen for a
>> reason/purpose, and that the ontology should explicitly represent
>> those reasons/purposes.
>>
>> If there are defined kinds of follow up, then he could define
>> define an instance to represent that process too.
>>
>> Finally, I wanted to make the technical point that that he
>> doesn't need to use constructs of the form:
>>
>>>>> set1 derivesFromUnionOf set2, set3
>>>>
>>
>> OWL provides the ability to say these things, even when the "set"
>> is the property values of an instance, for example, given
>>
>> Individual(c1 type(Computation)
>> value(geneComputedAsExpressed g1)
>> )
>>
>> Individual(c2 type(Computation)
>> value(geneComputedAsExpressed g2)
>> value(geneComputedAsExpressed g3)
>> )
>>
>> supposing that he wanted to represent a followup list to be
>> verified by RT PCR represented by the class RTPCRFollowup.
>> Let's say that wanted to call the property geneToFollowUp, with
>> inverse geneFollowedUpIn
>>
>> Individual(RTPCRFollowup1 type(RTPCRFollowup))
>>
>> EquivalentClasses(
>> unionOf(
>> restriction(GeneExpressedAccordingTo hasValue(c1))
>> restriction(GeneExpressedAccordingTo hasValue(c2)))
>> restriction(geneFollowedUpIn hasValue(RTPCRFollowup1))))
>>
>> Now, e.g. Pellet, will conclude that the values of the property
>> geneToFollowUp of instance RTPCRFollowup1 is exactly g1, g2, g3
>>
>> Of course that's not the only way to do it, but it does show that
>> OWL reasoning can make it economical to represent and work with
>> sets without having to go off and recapitulate set theory.
>>
>> -Alan
>>
>> On Sep 8, 2006, at 7:41 PM, Miller, Michael D (Rosetta) wrote:
>>
>>>
>>> Hi Alan,
>>>
>>> What you are describing is described in MAGE-OM/MAGE-ML, as a
>>> UML model
>>> to capture the real world aspects of running a microarray
>>> experiment.
>>>
>>> Typically at the end of this process a set of genes is identified as
>>> being interesting for some reason and one wants to know more
>>> about this
>>> set of genes beyond the microarray experiment that has been
>>> performed.
>>>
>>> I might be wrong but I think that is where Marco is starting, at
>>> the end
>>> of the experiment for follow-up.
>>>
>>> cheers,
>>> Michael
>>>
>>>> -----Original Message-----
>>>> From: public-semweb-lifesci-request@w3.org
>>>> <mailto:public-semweb-lifesci-request@w3.org>
>>>> [mailto:public-semweb-lifesci-request@w3.org] On Behalf Of
>>>> Alan Ruttenberg
>>>> Sent: Friday, September 08, 2006 3:07 PM
>>>> To: Marco Brandizi
>>>> Cc: semantic-web@w3.org <mailto:semantic-web@w3.org>;
>>>> public-semweb-lifesci@w3.org <mailto:public-semweb-lifesci@w3.org>
>>>> Subject: Re: Playing with sets in OWL...
>>>>
>>>>
>>>>
>>>> Hi Marco,
>>>>
>>>> There are a number of ways to work with sets, but I don't think I'd
>>>> approach this problem from that point of view.
>>>> Rather, I would start by thinking about what my domain instances
>>>> are, what their properties are, and what kinds of questions I
>>>> want to
>>>> be able to ask based on the representation. I'll sketch this out a
>>>> bit, though the fact that I name an object or property doesn't mean
>>>> that you have to supply it (remember OWL is open-world) - still
>>>> listing these make the ontology makes your intentions clearer and
>>>> the ontology easier to work with by others.
>>>>
>>>> The heading in each of these is a class, of which you would
>>>> make one
>>>> or more instances to represent your results.
>>>> The indented names are properties on instances of that class.
>>>>
>>>> An expression technology:
>>>> Vendor:
>>>> Product: e.g. array name
>>>> Name of spots on the array
>>>> Mappings: (maps of spot to gene - you might use e.g.
>>>> affymetrix,
>>>> or you might compute your own)
>>>>
>>>> ExpressionTechnologyMap
>>>> SpotMapping: (each value a spot mapping)
>>>>
>>>> Spot mapping:
>>>> SpotID:
>>>> GeneID:
>>>>
>>>> An expression profile experiment (call yours exp0)
>>>> When done:
>>>> Who did it:
>>>> What technology was used: (an expression technology)
>>>> Sample: (a sample)
>>>> Treatment: ...
>>>> Levels: A bunch of pairs of spot name, intensity
>>>>
>>>> Spot intensity
>>>> SpotID:
>>>> Intensity:
>>>>
>>>> A computation of which spots/genes are "expressed" (call yours c1)
>>>> Name of the method : e.g. mas5 above threshold
>>>> Parameter of the method: e.g. the threshold
>>>> Experiment: exp0
>>>> Spot Expressed: spots that were over threshold
>>>> Gene Computed As Expressed: genes that were over threshold
>>>>
>>>> And maybe:
>>>>
>>>> Conclusion
>>>> What was concluded:
>>>> By who:
>>>> Based on: c1
>>>>
>>>> All of what you enter for your experiment are instances (so
>>>> there are
>>>> no issues of OWL Full)
>>>>
>>>> Now, The gene set you wanted can be expressed as a class:
>>>>
>>>> Let's define an inverse property of
>>>> "GeneComputedAsExpressed", call
>>>> it "GeneExpressedAccordingTo"
>>>>
>>>> Class(Set1 partial restriction(GeneExpressedAccordingTo
>>>> hasValue(c1))
>>>>
>>>> Instances of Set1 will be those genes. You may or may not want to
>>>> actually define this class. However I don't think that youneed
>>>> to add any properties to it. Everything you would want to say
>>>> probably wants to be said on one of the instances - the experiment,
>>>> the computation, the conclusion, etc.
>>>>
>>>> Let me know if this helps/hurts - glad to discuss this some more
>>>>
>>>> -Alan
>>>>
>>>>
>>>>
>>>>
>>>> 2)
>>>>
>>>> On Sep 8, 2006, at 11:58 AM, Marco Brandizi wrote:
>>>>
>>>>>
>>>>> Hi all,
>>>>>
>>>>> sorry for the possible triviality of my questions, or the
>>>>
>>>> messed-up
>>>>
>>>>> mind
>>>>> I am possibly showing...
>>>>>
>>>>> I am trying to model the grouping of individuals into sets. In my
>>>>> application domain, the gene expression, people put
>>>>
>>>> together, let's
>>>>
>>>>> say
>>>>> genes, associating a meaning to the sets.
>>>>>
>>>>> For instance:
>>>>>
>>>>> Set1 := { gene1, gene2, gene3 }
>>>>>
>>>>> is the set of genes that are expressed in experiment0
>>>>>
>>>>> (genei and exp0 are OWL individuals)
>>>>>
>>>>>
>>>>> I am understanding that this may be formalized in OWL by:
>>>>>
>>>>> - declaring Set1 as owl:subClassOf Gene
>>>>> - using oneOf to declare the membership of g1,2,3
>>>>> (or simpler: (g1 type Set1), (g2 type Set1), etc. )
>>>>> - using hasValue with expressed and exp0
>>>>>
>>>>> (right?)
>>>>>
>>>>> Now, I am trying to build an application which is like a semantic
>>>>> wiki.
>>>>>
>>>>> Hence users have a quite direct contact with the underline
>>>>> ontology, and
>>>>> they can write, with a simplified syntax, statements about a
>>>>> subject
>>>>> they are describing (subject-centric approach).
>>>>>
>>>>> Commiting to the very formal formalism of OWL looks a bit
>>>>
>>>> too much...
>>>>
>>>>> formal... ;-) and hard to be handled with a semantic wiki-like
>>>>> application.
>>>>>
>>>>> Another problem is that the set could have properties on
>>>>
>>>> its own, for
>>>>
>>>>> instance:
>>>>>
>>>>> Set1 hasAuthor Jhon
>>>>>
>>>>> meaning that John is defining it. But hasAuthor is
>>>>
>>>> typically used for
>>>>
>>>>> individuals, and I wouldn't like to fall in OWL-Full, by
>>>>
>>>> making an OWL
>>>>
>>>>> reasoner to interpret Set1 both as an individual and a class.
>>>>>
>>>>> Aren't there more informal (although less precise) methods to
>>>>> model
>>>>> sets, or list of individuals?
>>>>>
>>>>> An approach could be modeling some sort of set-theory over
>>>>> individuals:
>>>>>
>>>>> set1 isA GeneSet
>>>>> set1 hasMember g1, g2, g3
>>>>> ...
>>>>>
>>>>> set1 derivesFromUnionOf set2, set3
>>>>>
>>>>> ...
>>>>>
>>>>> But I am not sure it would be a good approach, or if someone else
>>>>> already tried that.
>>>>>
>>>>> Any suggestion?
>>>>>
>>>>>
>>>>> Thanks in advance for a reply.
>>>>>
>>>>> Cheers.
>>>>>
>>>>> --
>>>>>
>>>>>
>>>> ==============================================================
>>>> ========
>>>>
>>>>> =========
>>>>> Marco Brandizi <brandizi@ebi.ac.uk <mailto:brandizi@ebi.ac.uk>>
>>>>> http://gca.btbs.unimib.it/brandizi
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
> Bill Bug
> Senior Research Analyst/Ontological Engineer
>
> Laboratory for Bioimaging & Anatomical Informatics
> www.neuroterrain.org
> Department of Neurobiology & Anatomy
> Drexel University College of Medicine
> 2900 Queen Lane
> Philadelphia, PA 19129
> 215 991 8430 (ph)
> 610 457 0443 (mobile)
> 215 843 9367 (fax)
>
>
> Please Note: I now have a new email - William.Bug@DrexelMed.edu
> <mailto:William.Bug@DrexelMed.edu>
>
>
>
>
>This email and any accompanying attachments are confidential.
>This information is intended solely for the use of the individual
>to whom it is addressed. Any review, disclosure, copying,
>distribution, or use of this email communication by others is strictly
>prohibited. If you are not the intended recipient please notify us
>immediately by returning this message to the sender and delete
>all copies. Thank you for your cooperation.
>
Received on Sunday, 10 September 2006 02:26:28 UTC