Re: ontology specs for self-publishing experiment from William Bug on 2006-07-06 (public-semweb-lifesci@w3.org from July 2006)

From: William Bug <William.Bug@DrexelMed.edu>
Date: Thu, 6 Jul 2006 13:22:30 -0400
To: "Miller, Michael D (Rosetta)" <Michael_Miller@Rosettabio.com>
Cc: Tim Clark <twclark@nmr.mgh.harvard.edu>, w3c semweb hcls <public-semweb-lifesci@w3.org>, SWAN Team <swan-team@mind-informatics.org>, Trish Whetzel <whetzel@pcbi.upenn.edu>, chris mungall <cjm@fruitfly.org>
Message-Id: <D421F3A8-7261-4ED1-B494-F827B4D2D8F9@DrexelMed.edu>
I must be really missing something here:

Two quick questions:

	1) If two labs are doing microarray experiments and each seeks to  
represent the data all the way back to the digital image acquired (so  
as to enable others to reanalyze the data, and modify the pooling and/ 
or statistics applied in this new, shared context), if both are using  
the exact same assay, instrument, and reagents but decide to specify  
the experimental observation provenance via two separate ontologies,  
how can an algorithm unambiguously determine even approximate  
semantic equivalence of things such as fluorescent indicator,  
sequence probe, optical elements, image acquisition elements?

	2) Doesn't this lead down a road similar to that of MIAME, only now  
you've shifted the border of incommensurateness beyond the level for  
data format and into the semantic domain?  What I mean is, won't  
there still be difficulty determining even approximate semantic  
equivalency for all of the details of data provenance - many of which  
absolutely must be resolved in order to perform large-scale re- 
pooling of related observations made in the context of different  
studies - even if nearly identical assays/instruments/reagents are used?

Don't get me wrong.  I'm not assuming ALL semantic ambiguity will be  
deterministically resolvable down to the finest level of  
granularity.   What I am concerned about is by completely decoupling  
any "contract" regarding ontology use - not ontology curation/ 
development - but ontology use - from the formalism used to create  
fine-grained representation of experimental observations, it seems we  
throw the baby out with the bath water and put a great burden on the  
Semantic Web developers to fill in (and maintain) the missing logic.

Please explain to me what I'm missing.  Just like Xiaoshu, I've been  
working on this issue in one form or another for nearly 10 years now,  
and I fairly convinced fine-grained, semantic disambiguation (not  
lexical - but semantic disambiguation - fine-grained to whatever  
level is practical) is one of the keys to pulling off large-scale,  
field-wide semantically-driven meta-analysis.

Cheers,
Bill

On Jul 6, 2006, at 12:38 PM, Miller, Michael D (Rosetta) wrote:

> Hi Tim,
>
> Essentially the idea of FuGE-OM is that it will be complete in  
> itself as a Platform Independent Model (in OMG/MDA terms) and will  
> have a FuGE-ML XML schema (a Platform Specific Model--PSM)  
> generated by AndroMDA.  MGED will (most likely) provide support in  
> the form of a FuGEstk with most likely Java and Perl PSM support.
>
> It is possible that some group may want an RDF PSM version of FuGE!
>
> It will soon be vetted through a process by PSI, MGED and any  
> interested parties and be available for extending into whatever  
> life sciences domains.  PSI has extended it for GEL-OM (http:// 
> psidev.sourceforge.net/gps/index.html), as a great example, and  
> work has started to extend it as MAGEv2.
>
> FuGE provides the underpinnings for describing the flow of material  
> and data as protocols are applied, including annotation.  One thing  
> to remember is that the ontology support in FuGE is entirely  
> neutral as to what ontologies these ontology individuals are  
> referencing--no more information about particular ontologies or how  
> ontology classes are related belong in a FuGE derived document  
> except the URI to get to the referenced class if it is in an  
> existing ontology.  It is expected that applications importing FuGE  
> documents will either have or look up the information on these  
> referenced ontologies after import if the application wishes to  
> support knowledge based tools.  Use of FuGE does not mandate that  
> an application be ontology aware, FuGE is a data and annotation  
> exchange specification.
>
> It is hoped that in the different domains of life sciences that  
> have a need to describe experiments/studies/investigations, that  
> FuGE provides a good core model to extend into the domain-specific  
> data/material/protocols.  It is actually a mistake to mention FuGE  
> development and ontology development as needing to go together.   
> The only real need the FuGE model needs as feedback is how well the  
> Ontology Individual support is modeled in UML.
>
> Then, I do believe, for best use of the FuGE model and its  
> extensions, great ontologies are needed and tools to take these  
> references in a FuGE document to go out to the semantic web and  
> make connections and to allow researchers to have ontologies to  
> annotate their experiments to be exported.  But FuGE development  
> itself doesn't need awareness of this ontology development effort.
>
> I am always reminded of two observations, if one has a hammer,  
> everything looks like a nail and anything can be programmed in  
> COBOL.  Not everything, I believe is best modeled as an ontology,  
> in particular, as I have said, the real life flow of a life science  
> experiment/investigation.  Yes, it can be done but it is an awkward  
> stretch.
>
> cheers,
> Michael
>
> Michael Miller
> Lead Software Developer
> Rosetta Biosoftware Business Unit
> www.rosettabio.com
> -----Original Message-----
> From: William Bug [mailto:William.Bug@DrexelMed.edu]
> Sent: Thursday, July 06, 2006 8:20 AM
> To: Tim Clark
> Cc: Miller, Michael D (Rosetta); Eric Neumann; AJ Chen; w3c semweb  
> hcls; SWAN Team
> Subject: Re: ontology specs for self-publishing experiment
>
> Dear Tim,
>
> I think this is an excellent idea - and comes at a very propitious  
> time.
>
> I would suggest including participants on the FuGO, PaTO, and EXPO  
> projects as well.
>
> Cheers,
> Bill
>
> On Jul 6, 2006, at 9:23 AM, Tim Clark wrote:
>
>> Michael
>>
>> The FuGE project may have some interesting overlaps with SWAN.   
>> Current phase of SWAN is focused on construction of annotation and  
>> publishing tools for semantically characterized hypotheses,  
>> claims, findings, counterclaims, etc on digital resources in  
>> neuromedicine, at the community level.  This is planned to be  
>> followed by a complementary phase involving management and  
>> characterization of laboratory results using an extension of the  
>> same ontology.
>>
>> I propose we arrange mutual presentations and discussions to see  
>> if any synergies exist such that we might take advantage or each  
>> others' work.
>>
>> Best
>>
>> Tim
>>
>> --------------------------------------------------------------------- 
>> ---------
>> Tim Clark 617-947-7098 (mobile)
>>
>> Director of Research Programs
>> Harvard University Initiative in Innovative Computing
>> 60 Oxford Street, Cambridge, MA 02138
>> http://iic.harvard.edu
>>
>> Director of Informatics
>> MassGeneral Institute for Neurodegenerative Disease
>> 114 16th Street, Charlestown, MA 02129
>> http://www.mindinformatics.org
>> --------------------------------------------------------------------- 
>> ---------
>>
>>
>>
>> On Jul 5, 2006, at 7:38 PM, Miller, Michael D (Rosetta) wrote:
>>
>>> Hi Eric,
>>>
>>> Just wanted to point out how this overlaps with the current FuGE  
>>> (http://fuge.sourceforge.net/) and FUGO (http:// 
>>> fugo.sourceforge.net/) efforts.  These are focused on systems  
>>> biology and are intended to provide the underpinnings of  
>>> reporting gene expression, gel, mass spec, and -omics experiments/ 
>>> investigations.
>>>
>>> The goal of FuGE (Functional Genomic Experiments) is for the most  
>>> part to provide:
>>>
>>> "a. Publishing Protocols
>>> b. Publishing Regants and Products
>>> c. Stating the Hypothesis (and model using RDF) that is being  
>>> tested by the experiment; this includes which citations are  
>>> supportive or alternative to ones hypothesis
>>> d. Publishing Experimental Data (possibly as RDF-OWL aggregates  
>>> and tables)
>>> e. Articulating the Results and Conclusions; specifically,  
>>> whether the experiment refutes or supports the central Hypothesis  
>>> (most of us agree we cannot 'prove' a hypothesis, only disprove it)"
>>>
>>> But it is a UML based model that will then have an equivalent XML  
>>> Schema generated.  The advantage, I think, this approach has over  
>>> a pure ontology representation is that it better captures the  
>>> actual work-flow of these experiments for the interchange of data  
>>> and annotation.  That being said, the UML model incorporates a  
>>> way to annotate the class objects with ontology Individuals with  
>>> a reference to the Individual's RDF class and its ontology.  The  
>>> UML model adds the additional semantics of identifiers (typically  
>>> expressed as LSIDs) that allows tying reference elements  
>>> generated in the XML Schema to the full definition of an object.   
>>> So a biological sample can be fully described in one document  
>>> then referenced by a treatment that incorporates it into a prep.
>>>
>>> So, for instance, typically a hypothesis is specific to the  
>>> particular experiment/investigation.  In FuGE, it is simply a  
>>> Description class with a text attribute associated by a  
>>> Hypothesis association to the Investigation class.  But in the  
>>> XML document, this specific Description can be annotated by  
>>> references to ontologies that allow hypothesis to be translated  
>>> to RDF upon import.  We used the OMG Ontology Definition  
>>> Metamodel specification mapping of Individuals from OWL/RDF to  
>>> UML so that these could then be mapped back to an OWL/RDF  
>>> representation for reasoning (http://www.omg.org/ontology/ 
>>> ontology_info.htm#RFIs,RFPs).
>>>
>>> FUGO is intended to become part of the OBO ontologies and FUGO's  
>>> goal is to provide general annotation terms for these type of  
>>> experiments.
>>>
>>> cheers,
>>> Michael
>>> Michael Miller
>>> Lead Software Developer
>>> Rosetta Biosoftware Business Unit
>>> www.rosettabio.com
>>>
>>> -----Original Message-----
>>> From: public-semweb-lifesci-request@w3.org [mailto:public-semweb- 
>>> lifesci-request@w3.org] On Behalf Of Eric Neumann
>>> Sent: Monday, July 03, 2006 6:57 AM
>>> To: AJ Chen
>>> Cc: w3c semweb hcls
>>> Subject: Re: ontology specs for self-publishing experiment
>>>
>>>
>>> AJ,
>>>
>>> This is a great start, and thanks for taking this on! I would  
>>> like to see this task force propose a conceptual framework within  
>>> the two months. It does not have to be final, but I think we need  
>>> to have others on the list review the ontologies (http:// 
>>> esw.w3.org/topic/HCLS/ScientificPublishingTaskForce? 
>>> action=AttachFile&do=get&target=SPE_Specs.html) and requirements  
>>> (http://esw.w3.org/topic/HCLS/SciPubSPERequirements) you have  
>>> proposed, ask questions about them, and adjust/expand as needed.
>>>
>>> I think there has been good discussions on this topic in the  
>>> past, and I would also refer folks to the SWAN paper by Gao et  
>>> al.  http://www.websemanticsjournal.org/ps/pub/2006-17 . This  
>>> work is inline with with what Tim Clark has been proposing to the  
>>> group, and I think it is a useful model to consider. Perhaps we  
>>> can combine these efforts and propose a workable (demo anyone?)  
>>> by the end of summer...
>>>
>>> In terms of gathering more Scientific Publishing of Experiments  
>>> (SPE) requirements, I wanted to list some items that appear to be  
>>> inter-related and relevant:
>>>
>>> 1. By Publishing experiments, one must also consider (i.e.,  
>>> include in the ontology):
>>> a. Publishing Protocols
>>> b. Publishing Regants and Products
>>> c. Stating the Hypothesis (and model using RDF) that is being  
>>> tested by the experiment; this includes which citations are  
>>> supportive or alternative to ones hypothesis
>>> d. Publishing Experimental Data (possibly as RDF-OWL aggregates  
>>> and tables)
>>> e. Articulating the Results and Conclusions; specifically,  
>>> whether the experiment refutes or supports the central Hypothesis  
>>> (most of us agree we cannot 'prove' a hypothesis, only disprove it)
>>>
>>> 2. Hypotheses should be defined in terms of authorship (ala DC),  
>>> what the proposed new concepts is, and what (experimental) fact  
>>> (or claim) is required to support it. It should also refer to  
>>> earlier hypotheses either by:
>>> a. extension of an earlier tested and supported hypothesis:  
>>> refinement
>>> b. similarity or congruence with another untested hypothesis:  
>>> supportive
>>> c. being an alternative to another hypothesis, that will qualify  
>>> itself through the refutation of the earlier one: refutation
>>> This would allow one to define rules and queries that can  
>>> traverse the lineage of hypotheses (forwards and backwards,  
>>> similar to citations), and how one papers work can be related to  
>>> ongoing work on different fronts that have branched.
>>>
>>> 3. "Publication" should be a specific concept in SPE, that would  
>>> serve to be the hub of DC metadata as well as the above  
>>> experimental data and hypotheses. Different non-disjoint  
>>> Publication "Roles" could be defined, such as  Peer-Reviewed,  
>>> Electronically-Published, Topic Review, and Follow-up Data. I  
>>> would also invite the folks interested in Clinical Publications  
>>> to specify what requirements they feel should be included, (e.g.  
>>> regulatory applications, Common Technical Document).
>>>
>>> I also think it would be useful if we could add a Concept Map  
>>> graphic for the proposed SPE ontology (class relations mainly).  
>>> Sometime ideas can get expressed faster to the larger community  
>>> using images.
>>>
>>> cheers,
>>> Eric
>>>
>>>
>>>
>>>
>>>> From: AJ Chen <canovaj@gmail.com>
>>>> Date: Sun, 25 Jun 2006 16:00:23 -0700
>>>> Message-ID:  
>>>> <70055a110606251600m469b7d63t405579e7a61e7ef8@mail.gmail.com>
>>>> To: public-semweb-lifesci@w3.org
>>>> I added the first draft of specs for the ontology being  
>>>> developed for
>>>> self-publishing experiment. see the link on the task wiki page -
>>>> http://esw.w3.org/topic/HCLS/ScientificPublishingTaskForce
>>>>
>>>> This specs document and the requiremnets document are meant to  
>>>> be only the
>>>> starting point for discussion.  I truly hope more people in this  
>>>> group will
>>>> participate in this open development process, making comments or  
>>>> providing
>>>> changes to the documents.
>>>>
>>>> While the ontology is being developed by this community, I am  
>>>> going to
>>>> develop a self-publishing tool that implements the ontology,  
>>>> which allows
>>>> you to try this new way of sharing research information. With  
>>>> easy-to-use
>>>> tools to demonstrate the benefits of sharing and searching  
>>>> experiment
>>>> information in semantic data format, it will help attract more  
>>>> people to
>>>> contribute to the development of the ontology as well as the tools.
>>>>
>>>> Best,
>>>> AJ
>>>
>>>
>>>
>>> Eric Neumann, PhD
>>> co-chair, W3C Healthcare and Life Sciences,
>>> and Senior Director Product Strategy
>>> Teranode Corporation
>>> 83 South King Street, Suite 800
>>> Seattle, WA 98104
>>> +1 (781)856-9132
>>> www.teranode.com
>>>
>>
>
> Bill Bug
> Senior Analyst/Ontological Engineer
>
> Laboratory for Bioimaging  & Anatomical Informatics
> www.neuroterrain.org
> Department of Neurobiology & Anatomy
> Drexel University College of Medicine
> 2900 Queen Lane
> Philadelphia, PA    19129
> 215 991 8430 (ph)
> 610 457 0443 (mobile)
> 215 843 9367 (fax)
>
>
> Please Note: I now have a new email - William.Bug@DrexelMed.edu
>
>
>
>
> This email and any accompanying attachments are confidential.
> This information is intended solely for the use of the individual
> to whom it is addressed. Any review, disclosure, copying,
> distribution, or use of this email communication by others is strictly
> prohibited. If you are not the intended recipient please notify us
> immediately by returning this message to the sender and delete
> all copies. Thank you for your cooperation.

Bill Bug
Senior Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)


Please Note: I now have a new email - William.Bug@DrexelMed.edu







This email and any accompanying attachments are confidential. 
This information is intended solely for the use of the individual 
to whom it is addressed. Any review, disclosure, copying, 
distribution, or use of this email communication by others is strictly 
prohibited. If you are not the intended recipient please notify us 
immediately by returning this message to the sender and delete 
all copies. Thank you for your cooperation.
Received on Thursday, 6 July 2006 17:23:00 UTC