- From: William Bug <William.Bug@DrexelMed.edu>
- Date: Wed, 5 Jul 2006 16:43:55 -0400
- To: "Miller, Michael D (Rosetta)" <Michael_Miller@Rosettabio.com>
- Cc: "Eric Neumann" <eneumann@teranode.com>, "AJ Chen" <canovaj@gmail.com>, "w3c semweb hcls" <public-semweb-lifesci@w3.org>
- Message-Id: <EA2C579A-2ED4-4971-9E3D-42FB7B580BE8@DrexelMed.edu>
Hi Michael, I completely agree the overlap with this statement of goals for the "self-publishing" of experiments is very much in synchrony with what you describe for FuGE, as well as the intended goal for use of PaTO - and the goal behind all of what we are doing on the BIRN Ontology Task Force (BIRN-OTF) and with the BIRN Mediator. The one point I have a comment on is: On Jul 5, 2006, at 1:38 PM, Miller, Michael D (Rosetta) wrote: > The advantage, I think, this approach has over a pure ontology > representation is that it better captures the actual work-flow of > these experiments for the interchange of data and annotation. That > being said, the UML model incorporates a way to annotate the class > objects with ontology Individuals with a reference to the > Individual's RDF class and its ontology. The UML model adds the > additional semantics of identifiers (typically expressed as LSIDs) > that allows tying reference elements generated in the XML Schema to > the full definition of an object. So a biological sample can be > fully described in one document then referenced by a treatment that > incorporates it into a prep. I believe it is not the intension of any of the various projects I cite above - or any of the ontology projects associated with the OBO Foundry - to represent instance data in its entirety in the context of an OWL or Protege-Frames document. I believe the same can be said for the arguments presented in Gao's article on SWAN and what I assume Eric refers to as "what Tim Clark has been proposing to the group." The goal for all of the ontology projects - as I understand them (and I certainly may be wrong) - is to provide a SHARED, STRUCTURED, SEMANTIC FRAMEWORK for describing observations. This framework must cover all the deep, dark corners of every required knowledge domain down the finest level of granularity practical, but this is in no way the same as mapping instance data into this ontological framework. This may actually include a formalism for representing instance data (see PaTO publications) - which I sincerely hope at some level will be RDF based - for a whole host of reasons. The ontologies have begun to address some of the more complex - and somewhat abstract - entities such as 'project', 'study', 'experiment', 'hypothesis', 'result', 'conclusion', etc.. You can see this in FuGO, and in other projects associated with the OBO Foundry effort. As Eric outlines, mapping instances of these entities and the complex relations between them (both within a given 'study' and across disparate but related studies) is critical to being able to assemble a fluid system for extracting, representing, and mining the semantic data space. It's in this mapping where I strongly believe RDF provides a very significant benefit beyond of XML Schemas, XSLT translations, XPath and other means of interlinking XML documents, XQuery, etc. By the way, the "mapping" I refer to above linking instance data where ever it may reside (primary data repositories, pooled/analyzed/ interpreted data, the scientific literature) to entities in the ontologies requires reference to the lexicon - the TERMS used to describe the ontological fundamentals by the scientists reporting them. This is true whether an algorithm or a human is trying to understand and interpret a collection of instance data in the context of the relevant knowledge framework, even if that framework resides in the head of the human researcher. I like to think of this distinction as being very coarsely analogous to the distinction between the physical data model in an RDBMS and the many tools used to make that more abstracted, normalized collection of related entities directly useful for specific applications - e.g., SQL SELECT statements, VIEWs, and/or Materialized VIEWS. Maintaining these as distinct elements goes a long way toward ensuring the abstraction is re-usable for a large set of applications, while simultaneously being able to support each application's detailed requirements through custom de-normalization. This is why I like to keep the lexicon distinct from the ontology. They are intimately linked. No ontology is free of lexical artifacts (I'm not certain it can or should be), anymore than a lexical graph can be assembled without representing semantic relations. Analysis of the lexicon can inform how to adapt the semantic graph in the ontology - make it more commensurate with the current state of knowledge as expressed by domain experts, and review of term use in the context of the ontology can be a great help in creating effective, structured, controlled terminological resources. However, the two types of knowledge resource are constructed via different process, support different Use Cases, and rely on different fundamental relations at their core, however intimately they may be linked. What I think I'm saying here is - to a coin a phrase often used by Jeff Grethe from the BIRN Coordinating Center (BIRN-CC) - I think on many of the issues you and I have been debating over the last several weeks, we are in "violent agreement." ;-) I completely agree with Eric's suggestion we come up with a very clear, concise statement of how the W3C SW HCLSIG intends to provide clear working examples of this approach and propose a plan for to providing such examples as expeditiously as is practical (Eric suggests "this summer"). Cheers, Bill On Jul 5, 2006, at 1:38 PM, Miller, Michael D (Rosetta) wrote: > Hi Eric, > > Just wanted to point out how this overlaps with the current FuGE > (http://fuge.sourceforge.net/) and FUGO (http:// > fugo.sourceforge.net/) efforts. These are focused on systems > biology and are intended to provide the underpinnings of reporting > gene expression, gel, mass spec, and -omics experiments/ > investigations. > > The goal of FuGE (Functional Genomic Experiments) is for the most > part to provide: > > "a. Publishing Protocols > b. Publishing Regants and Products > c. Stating the Hypothesis (and model using RDF) that is being > tested by the experiment; this includes which citations are > supportive or alternative to ones hypothesis > d. Publishing Experimental Data (possibly as RDF-OWL aggregates and > tables) > e. Articulating the Results and Conclusions; specifically, whether > the experiment refutes or supports the central Hypothesis (most of > us agree we cannot 'prove' a hypothesis, only disprove it)" > > But it is a UML based model that will then have an equivalent XML > Schema generated. The advantage, I think, this approach has over a > pure ontology representation is that it better captures the actual > work-flow of these experiments for the interchange of data and > annotation. That being said, the UML model incorporates a way to > annotate the class objects with ontology Individuals with a > reference to the Individual's RDF class and its ontology. The UML > model adds the additional semantics of identifiers (typically > expressed as LSIDs) that allows tying reference elements generated > in the XML Schema to the full definition of an object. So a > biological sample can be fully described in one document then > referenced by a treatment that incorporates it into a prep. > > So, for instance, typically a hypothesis is specific to the > particular experiment/investigation. In FuGE, it is simply a > Description class with a text attribute associated by a Hypothesis > association to the Investigation class. But in the XML document, > this specific Description can be annotated by references to > ontologies that allow hypothesis to be translated to RDF upon > import. We used the OMG Ontology Definition Metamodel > specification mapping of Individuals from OWL/RDF to UML so that > these could then be mapped back to an OWL/RDF representation for > reasoning (http://www.omg.org/ontology/ontology_info.htm#RFIs,RFPs). > > FUGO is intended to become part of the OBO ontologies and FUGO's > goal is to provide general annotation terms for these type of > experiments. > > cheers, > Michael > Michael Miller > Lead Software Developer > Rosetta Biosoftware Business Unit > www.rosettabio.com > > -----Original Message----- > From: public-semweb-lifesci-request@w3.org [mailto:public-semweb- > lifesci-request@w3.org] On Behalf Of Eric Neumann > Sent: Monday, July 03, 2006 6:57 AM > To: AJ Chen > Cc: w3c semweb hcls > Subject: Re: ontology specs for self-publishing experiment > > > AJ, > > This is a great start, and thanks for taking this on! I would like > to see this task force propose a conceptual framework within the > two months. It does not have to be final, but I think we need to > have others on the list review the ontologies (http://esw.w3.org/ > topic/HCLS/ScientificPublishingTaskForce? > action=AttachFile&do=get&target=SPE_Specs.html) and requirements > (http://esw.w3.org/topic/HCLS/SciPubSPERequirements) you have > proposed, ask questions about them, and adjust/expand as needed. > > I think there has been good discussions on this topic in the past, > and I would also refer folks to the SWAN paper by Gao et al. > http://www.websemanticsjournal.org/ps/pub/2006-17 . This work is > inline with with what Tim Clark has been proposing to the group, > and I think it is a useful model to consider. Perhaps we can > combine these efforts and propose a workable (demo anyone?) by the > end of summer... > > In terms of gathering more Scientific Publishing of Experiments > (SPE) requirements, I wanted to list some items that appear to be > inter-related and relevant: > > 1. By Publishing experiments, one must also consider (i.e., include > in the ontology): > a. Publishing Protocols > b. Publishing Regants and Products > c. Stating the Hypothesis (and model using RDF) that is being > tested by the experiment; this includes which citations are > supportive or alternative to ones hypothesis > d. Publishing Experimental Data (possibly as RDF-OWL aggregates and > tables) > e. Articulating the Results and Conclusions; specifically, whether > the experiment refutes or supports the central Hypothesis (most of > us agree we cannot 'prove' a hypothesis, only disprove it) > > 2. Hypotheses should be defined in terms of authorship (ala DC), > what the proposed new concepts is, and what (experimental) fact (or > claim) is required to support it. It should also refer to earlier > hypotheses either by: > a. extension of an earlier tested and supported hypothesis: refinement > b. similarity or congruence with another untested hypothesis: > supportive > c. being an alternative to another hypothesis, that will qualify > itself through the refutation of the earlier one: refutation > This would allow one to define rules and queries that can traverse > the lineage of hypotheses (forwards and backwards, similar to > citations), and how one papers work can be related to ongoing work > on different fronts that have branched. > > 3. "Publication" should be a specific concept in SPE, that would > serve to be the hub of DC metadata as well as the above > experimental data and hypotheses. Different non-disjoint > Publication "Roles" could be defined, such as Peer-Reviewed, > Electronically-Published, Topic Review, and Follow-up Data. I would > also invite the folks interested in Clinical Publications to > specify what requirements they feel should be included, (e.g. > regulatory applications, Common Technical Document). > > I also think it would be useful if we could add a Concept Map > graphic for the proposed SPE ontology (class relations mainly). > Sometime ideas can get expressed faster to the larger community > using images. > > cheers, > Eric > > > > >> From: AJ Chen <canovaj@gmail.com> >> Date: Sun, 25 Jun 2006 16:00:23 -0700 >> Message-ID: >> <70055a110606251600m469b7d63t405579e7a61e7ef8@mail.gmail.com> >> To: public-semweb-lifesci@w3.org >> I added the first draft of specs for the ontology being developed for >> self-publishing experiment. see the link on the task wiki page - >> http://esw.w3.org/topic/HCLS/ScientificPublishingTaskForce >> >> This specs document and the requiremnets document are meant to be >> only the >> starting point for discussion. I truly hope more people in this >> group will >> participate in this open development process, making comments or >> providing >> changes to the documents. >> >> While the ontology is being developed by this community, I am >> going to >> develop a self-publishing tool that implements the ontology, which >> allows >> you to try this new way of sharing research information. With easy- >> to-use >> tools to demonstrate the benefits of sharing and searching experiment >> information in semantic data format, it will help attract more >> people to >> contribute to the development of the ontology as well as the tools. >> >> Best, >> AJ > > > > Eric Neumann, PhD > co-chair, W3C Healthcare and Life Sciences, > and Senior Director Product Strategy > Teranode Corporation > 83 South King Street, Suite 800 > Seattle, WA 98104 > +1 (781)856-9132 > www.teranode.com > Bill Bug Senior Analyst/Ontological Engineer Laboratory for Bioimaging & Anatomical Informatics www.neuroterrain.org Department of Neurobiology & Anatomy Drexel University College of Medicine 2900 Queen Lane Philadelphia, PA 19129 215 991 8430 (ph) 610 457 0443 (mobile) 215 843 9367 (fax) Please Note: I now have a new email - William.Bug@DrexelMed.edu This email and any accompanying attachments are confidential. This information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this email communication by others is strictly prohibited. If you are not the intended recipient please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.
Received on Wednesday, 5 July 2006 20:45:30 UTC