Re: ontology specs for self-publishing experiment from William Bug on 2006-07-05 (public-semweb-lifesci@w3.org from July 2006)

From: William Bug <William.Bug@DrexelMed.edu>
Date: Wed, 5 Jul 2006 16:43:55 -0400
To: "Miller, Michael D (Rosetta)" <Michael_Miller@Rosettabio.com>
Cc: "Eric Neumann" <eneumann@teranode.com>, "AJ Chen" <canovaj@gmail.com>, "w3c semweb hcls" <public-semweb-lifesci@w3.org>
Message-Id: <EA2C579A-2ED4-4971-9E3D-42FB7B580BE8@DrexelMed.edu>
Hi Michael,

I completely agree the overlap with this statement of goals for the  
"self-publishing" of experiments is very much in synchrony with what  
you describe for FuGE, as well as the intended goal for use of PaTO -  
and the goal behind all of what we are doing on the BIRN Ontology  
Task Force (BIRN-OTF) and with the BIRN Mediator.

The one point I have a comment on is:

On Jul 5, 2006, at 1:38 PM, Miller, Michael D (Rosetta) wrote:
> The advantage, I think, this approach has over a pure ontology  
> representation is that it better captures the actual work-flow of  
> these experiments for the interchange of data and annotation.  That  
> being said, the UML model incorporates a way to annotate the class  
> objects with ontology Individuals with a reference to the  
> Individual's RDF class and its ontology.  The UML model adds the  
> additional semantics of identifiers (typically expressed as LSIDs)  
> that allows tying reference elements generated in the XML Schema to  
> the full definition of an object.  So a biological sample can be  
> fully described in one document then referenced by a treatment that  
> incorporates it into a prep.

I believe it is not the intension of any of the various projects I  
cite above - or any of the ontology projects associated with the OBO  
Foundry - to represent instance data in its entirety in the context  
of an OWL or Protege-Frames document.  I believe the same can be said  
for the arguments presented in Gao's article on SWAN and what I  
assume Eric refers to as "what Tim Clark has been proposing to the  
group."  The goal for all of the ontology projects - as I understand  
them (and I certainly may be wrong) - is to provide a SHARED,  
STRUCTURED, SEMANTIC FRAMEWORK for describing observations.  This  
framework must cover all the deep, dark corners of every required  
knowledge domain down the finest level of granularity practical, but  
this is in no way the same as mapping instance data into this  
ontological framework.  This may actually include a formalism for  
representing instance data (see PaTO publications) - which I  
sincerely hope at some level will be RDF based - for a whole host of  
reasons.

The ontologies have begun to address some of the more complex - and  
somewhat abstract - entities such as 'project', 'study',  
'experiment', 'hypothesis', 'result', 'conclusion', etc..  You can  
see this in FuGO, and in other projects associated with the OBO  
Foundry effort.  As Eric outlines, mapping instances of these  
entities and the complex relations between them (both within a given  
'study' and across disparate but related studies) is critical to  
being able to assemble a fluid system for extracting, representing,  
and mining the semantic data space.  It's in this mapping where I  
strongly believe RDF provides a very significant benefit beyond of  
XML Schemas, XSLT translations, XPath and other means of interlinking  
XML documents, XQuery, etc.

By the way, the "mapping" I refer to above linking instance data  
where ever it may reside (primary data repositories, pooled/analyzed/ 
interpreted data, the scientific literature) to entities in the  
ontologies requires reference to the lexicon - the TERMS used to  
describe the ontological fundamentals by the scientists reporting  
them.  This is true whether an algorithm or a human is trying to  
understand and interpret a collection of instance data in the context  
of the relevant knowledge framework, even if that framework resides  
in the head of the human researcher.

I like to think of this distinction as being very coarsely analogous  
to the distinction between the physical data model in an RDBMS and  
the many tools used to make that more abstracted, normalized  
collection of related entities directly useful for specific  
applications - e.g., SQL SELECT statements, VIEWs, and/or  
Materialized VIEWS.  Maintaining these as distinct elements goes a  
long way toward ensuring the abstraction is re-usable for a large set  
of applications, while simultaneously being able to support each  
application's detailed requirements through custom de-normalization.

This is why I like to keep the lexicon distinct from the ontology.   
They are intimately linked.  No ontology is free of lexical artifacts  
(I'm not certain it can or should be), anymore than a lexical graph  
can be assembled without representing semantic relations.  Analysis  
of the lexicon can inform how to adapt the semantic graph in the  
ontology - make it more commensurate with the current state of  
knowledge as expressed by domain experts, and review of term use in  
the context of the ontology can be a great help in creating  
effective, structured, controlled terminological resources.  However,  
the two types of knowledge resource are constructed via different  
process, support different Use Cases, and rely on different  
fundamental relations at their core, however intimately they may be  
linked.

What I think I'm saying here is - to a coin a phrase often used by  
Jeff Grethe from the BIRN Coordinating Center (BIRN-CC) - I think on  
many of the issues you and I have been debating over the last several  
weeks, we are in "violent agreement."  ;-)

I completely agree with Eric's suggestion we come up with a very  
clear, concise statement of how the W3C SW HCLSIG intends to provide  
clear working examples of this approach and propose a plan for to  
providing such examples as expeditiously as is practical (Eric  
suggests "this summer").

Cheers,
Bill


On Jul 5, 2006, at 1:38 PM, Miller, Michael D (Rosetta) wrote:

> Hi Eric,
>
> Just wanted to point out how this overlaps with the current FuGE  
> (http://fuge.sourceforge.net/) and FUGO (http:// 
> fugo.sourceforge.net/) efforts.  These are focused on systems  
> biology and are intended to provide the underpinnings of reporting  
> gene expression, gel, mass spec, and -omics experiments/ 
> investigations.
>
> The goal of FuGE (Functional Genomic Experiments) is for the most  
> part to provide:
>
> "a. Publishing Protocols
> b. Publishing Regants and Products
> c. Stating the Hypothesis (and model using RDF) that is being  
> tested by the experiment; this includes which citations are  
> supportive or alternative to ones hypothesis
> d. Publishing Experimental Data (possibly as RDF-OWL aggregates and  
> tables)
> e. Articulating the Results and Conclusions; specifically, whether  
> the experiment refutes or supports the central Hypothesis (most of  
> us agree we cannot 'prove' a hypothesis, only disprove it)"
>
> But it is a UML based model that will then have an equivalent XML  
> Schema generated.  The advantage, I think, this approach has over a  
> pure ontology representation is that it better captures the actual  
> work-flow of these experiments for the interchange of data and  
> annotation.  That being said, the UML model incorporates a way to  
> annotate the class objects with ontology Individuals with a  
> reference to the Individual's RDF class and its ontology.  The UML  
> model adds the additional semantics of identifiers (typically  
> expressed as LSIDs) that allows tying reference elements generated  
> in the XML Schema to the full definition of an object.  So a  
> biological sample can be fully described in one document then  
> referenced by a treatment that incorporates it into a prep.
>
> So, for instance, typically a hypothesis is specific to the  
> particular experiment/investigation.  In FuGE, it is simply a  
> Description class with a text attribute associated by a Hypothesis  
> association to the Investigation class.  But in the XML document,  
> this specific Description can be annotated by references to  
> ontologies that allow hypothesis to be translated to RDF upon  
> import.  We used the OMG Ontology Definition Metamodel  
> specification mapping of Individuals from OWL/RDF to UML so that  
> these could then be mapped back to an OWL/RDF representation for  
> reasoning (http://www.omg.org/ontology/ontology_info.htm#RFIs,RFPs).
>
> FUGO is intended to become part of the OBO ontologies and FUGO's  
> goal is to provide general annotation terms for these type of  
> experiments.
>
> cheers,
> Michael
> Michael Miller
> Lead Software Developer
> Rosetta Biosoftware Business Unit
> www.rosettabio.com
>
> -----Original Message-----
> From: public-semweb-lifesci-request@w3.org [mailto:public-semweb- 
> lifesci-request@w3.org] On Behalf Of Eric Neumann
> Sent: Monday, July 03, 2006 6:57 AM
> To: AJ Chen
> Cc: w3c semweb hcls
> Subject: Re: ontology specs for self-publishing experiment
>
>
> AJ,
>
> This is a great start, and thanks for taking this on! I would like  
> to see this task force propose a conceptual framework within the  
> two months. It does not have to be final, but I think we need to  
> have others on the list review the ontologies (http://esw.w3.org/ 
> topic/HCLS/ScientificPublishingTaskForce? 
> action=AttachFile&do=get&target=SPE_Specs.html) and requirements  
> (http://esw.w3.org/topic/HCLS/SciPubSPERequirements) you have  
> proposed, ask questions about them, and adjust/expand as needed.
>
> I think there has been good discussions on this topic in the past,  
> and I would also refer folks to the SWAN paper by Gao et al.   
> http://www.websemanticsjournal.org/ps/pub/2006-17 . This work is  
> inline with with what Tim Clark has been proposing to the group,  
> and I think it is a useful model to consider. Perhaps we can  
> combine these efforts and propose a workable (demo anyone?) by the  
> end of summer...
>
> In terms of gathering more Scientific Publishing of Experiments  
> (SPE) requirements, I wanted to list some items that appear to be  
> inter-related and relevant:
>
> 1. By Publishing experiments, one must also consider (i.e., include  
> in the ontology):
> a. Publishing Protocols
> b. Publishing Regants and Products
> c. Stating the Hypothesis (and model using RDF) that is being  
> tested by the experiment; this includes which citations are  
> supportive or alternative to ones hypothesis
> d. Publishing Experimental Data (possibly as RDF-OWL aggregates and  
> tables)
> e. Articulating the Results and Conclusions; specifically, whether  
> the experiment refutes or supports the central Hypothesis (most of  
> us agree we cannot 'prove' a hypothesis, only disprove it)
>
> 2. Hypotheses should be defined in terms of authorship (ala DC),  
> what the proposed new concepts is, and what (experimental) fact (or  
> claim) is required to support it. It should also refer to earlier  
> hypotheses either by:
> a. extension of an earlier tested and supported hypothesis: refinement
> b. similarity or congruence with another untested hypothesis:  
> supportive
> c. being an alternative to another hypothesis, that will qualify  
> itself through the refutation of the earlier one: refutation
> This would allow one to define rules and queries that can traverse  
> the lineage of hypotheses (forwards and backwards, similar to  
> citations), and how one papers work can be related to ongoing work  
> on different fronts that have branched.
>
> 3. "Publication" should be a specific concept in SPE, that would  
> serve to be the hub of DC metadata as well as the above  
> experimental data and hypotheses. Different   non-disjoint  
> Publication "Roles" could be defined, such as  Peer-Reviewed,  
> Electronically-Published, Topic Review, and Follow-up Data. I would  
> also invite the folks interested in Clinical Publications to  
> specify what requirements they feel should be included, (e.g.  
> regulatory applications, Common Technical Document).
>
> I also think it would be useful if we could add a Concept Map  
> graphic for the proposed SPE ontology (class relations mainly).  
> Sometime ideas can get expressed faster to the larger community  
> using images.
>
> cheers,
> Eric
>
>
>
>
>> From: AJ Chen <canovaj@gmail.com>
>> Date: Sun, 25 Jun 2006 16:00:23 -0700
>> Message-ID:  
>> <70055a110606251600m469b7d63t405579e7a61e7ef8@mail.gmail.com>
>> To: public-semweb-lifesci@w3.org
>> I added the first draft of specs for the ontology being developed for
>> self-publishing experiment. see the link on the task wiki page -
>> http://esw.w3.org/topic/HCLS/ScientificPublishingTaskForce
>>
>> This specs document and the requiremnets document are meant to be  
>> only the
>> starting point for discussion.  I truly hope more people in this  
>> group will
>> participate in this open development process, making comments or  
>> providing
>> changes to the documents.
>>
>> While the ontology is being developed by this community, I am  
>> going to
>> develop a self-publishing tool that implements the ontology, which  
>> allows
>> you to try this new way of sharing research information. With easy- 
>> to-use
>> tools to demonstrate the benefits of sharing and searching experiment
>> information in semantic data format, it will help attract more  
>> people to
>> contribute to the development of the ontology as well as the tools.
>>
>> Best,
>> AJ
>
>
>
> Eric Neumann, PhD
> co-chair, W3C Healthcare and Life Sciences,
> and Senior Director Product Strategy
> Teranode Corporation
> 83 South King Street, Suite 800
> Seattle, WA 98104
> +1 (781)856-9132
> www.teranode.com
>

Bill Bug
Senior Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)


Please Note: I now have a new email - William.Bug@DrexelMed.edu







This email and any accompanying attachments are confidential. 
This information is intended solely for the use of the individual 
to whom it is addressed. Any review, disclosure, copying, 
distribution, or use of this email communication by others is strictly 
prohibited. If you are not the intended recipient please notify us 
immediately by returning this message to the sender and delete 
all copies. Thank you for your cooperation.
Received on Wednesday, 5 July 2006 20:45:30 UTC