RE: ontology specs for self-publishing experiment

Hi Bill,
 
You do miss my point.
 
"1) If two labs are doing microarray experiments and each seeks to
represent the data all the way back to the digital image acquired (so as
to enable others to reanalyze the data, and modify the pooling and/or
statistics applied in this new, shared context), if both are using the
exact same assay, instrument, and reagents but decide to specify the
experimental observation provenance via two separate ontologies, how can
an algorithm unambiguously determine even approximate semantic
equivalence of things such as fluorescent indicator, sequence probe,
optical elements, image acquisition elements?"
 
It is outside the provenance of FuGE to state what ontologies should be
used--that is a social issue that we are not interested in solving in
FuGE, that is up to the wider community.  It seems that this is of great
interest to the HLCS community, which is why we leave it to this
community and others and not in FuGE.  We have enough trouble defining
how the flow of protocols should be modeled.
 
To tie FuGE to particular ontologies would be impossible, it would mean
no future ontology could be included without an update to the base
standard.  Also, in the wider community those two researchers could view
each others' ontologies as inferior and be carrying on a separate
argument as to which one is using the correct ontology.
 
There are also many different kinds of experiments so that FuGE must
allow the appropriate choice of ontology.  Even the exact same
experiment might be annotated differently if the researchers were
interested in different aspects of the experiment.
 
cheers,
Michael

	-----Original Message-----
	From: William Bug [mailto:William.Bug@DrexelMed.edu] 
	Sent: Thursday, July 06, 2006 10:23 AM
	To: Miller, Michael D (Rosetta)
	Cc: Tim Clark; w3c semweb hcls; SWAN Team; Trish Whetzel; chris
mungall
	Subject: Re: ontology specs for self-publishing experiment
	
	
	I must be really missing something here: 

	Two quick questions: 
	
	
	1) If two labs are doing microarray experiments and each seeks
to represent the data all the way back to the digital image acquired (so
as to enable others to reanalyze the data, and modify the pooling and/or
statistics applied in this new, shared context), if both are using the
exact same assay, instrument, and reagents but decide to specify the
experimental observation provenance via two separate ontologies, how can
an algorithm unambiguously determine even approximate semantic
equivalence of things such as fluorescent indicator, sequence probe,
optical elements, image acquisition elements?

	2) Doesn't this lead down a road similar to that of MIAME, only
now you've shifted the border of incommensurateness beyond the level for
data format and into the semantic domain?  What I mean is, won't there
still be difficulty determining even approximate semantic equivalency
for all of the details of data provenance - many of which absolutely
must be resolved in order to perform large-scale re-pooling of related
observations made in the context of different studies - even if nearly
identical assays/instruments/reagents are used?

	Don't get me wrong.  I'm not assuming ALL semantic ambiguity
will be deterministically resolvable down to the finest level of
granularity.   What I am concerned about is by completely decoupling any
"contract" regarding ontology use - not ontology curation/development -
but ontology use - from the formalism used to create fine-grained
representation of experimental observations, it seems we throw the baby
out with the bath water and put a great burden on the Semantic Web
developers to fill in (and maintain) the missing logic.

	Please explain to me what I'm missing.  Just like Xiaoshu, I've
been working on this issue in one form or another for nearly 10 years
now, and I fairly convinced fine-grained, semantic disambiguation (not
lexical - but semantic disambiguation - fine-grained to whatever level
is practical) is one of the keys to pulling off large-scale, field-wide
semantically-driven meta-analysis.

	Cheers,
	Bill

	On Jul 6, 2006, at 12:38 PM, Miller, Michael D (Rosetta) wrote:


		Hi Tim,
		 
		Essentially the idea of FuGE-OM is that it will be
complete in itself as a Platform Independent Model (in OMG/MDA terms)
and will have a FuGE-ML XML schema (a Platform Specific Model--PSM)
generated by AndroMDA.  MGED will (most likely) provide support in the
form of a FuGEstk with most likely Java and Perl PSM support.
		 
		It is possible that some group may want an RDF PSM
version of FuGE!
		 
		It will soon be vetted through a process by PSI, MGED
and any interested parties and be available for extending into whatever
life sciences domains.  PSI has extended it for GEL-OM
(http://psidev.sourceforge.net/gps/index.html), as a great example, and
work has started to extend it as MAGEv2.
		 
		FuGE provides the underpinnings for describing the flow
of material and data as protocols are applied, including annotation.
One thing to remember is that the ontology support in FuGE is entirely
neutral as to what ontologies these ontology individuals are
referencing--no more information about particular ontologies or how
ontology classes are related belong in a FuGE derived document except
the URI to get to the referenced class if it is in an existing ontology.
It is expected that applications importing FuGE documents will either
have or look up the information on these referenced ontologies after
import if the application wishes to support knowledge based tools.  Use
of FuGE does not mandate that an application be ontology aware, FuGE is
a data and annotation exchange specification.
		 
		It is hoped that in the different domains of life
sciences that have a need to describe
experiments/studies/investigations, that FuGE provides a good core model
to extend into the domain-specific data/material/protocols.  It is
actually a mistake to mention FuGE development and ontology development
as needing to go together.  The only real need the FuGE model needs as
feedback is how well the Ontology Individual support is modeled in UML.

		 
		Then, I do believe, for best use of the FuGE model and
its extensions, great ontologies are needed and tools to take these
references in a FuGE document to go out to the semantic web and make
connections and to allow researchers to have ontologies to annotate
their experiments to be exported.  But FuGE development itself doesn't
need awareness of this ontology development effort.
		 
		I am always reminded of two observations, if one has a
hammer, everything looks like a nail and anything can be programmed in
COBOL.  Not everything, I believe is best modeled as an ontology, in
particular, as I have said, the real life flow of a life science
experiment/investigation.  Yes, it can be done but it is an awkward
stretch.
		 
		cheers,
		Michael
		 
		Michael Miller 
		Lead Software Developer 
		Rosetta Biosoftware Business Unit 
		www.rosettabio.com 

			-----Original Message-----
			From: William Bug
[mailto:William.Bug@DrexelMed.edu] 
			Sent: Thursday, July 06, 2006 8:20 AM
			To: Tim Clark
			Cc: Miller, Michael D (Rosetta); Eric Neumann;
AJ Chen; w3c semweb hcls; SWAN Team
			Subject: Re: ontology specs for self-publishing
experiment
			
			
			Dear Tim,

			I think this is an excellent idea - and comes at
a very propitious time.

			I would suggest including participants on the
FuGO, PaTO, and EXPO projects as well. 

			Cheers,
			Bill

			On Jul 6, 2006, at 9:23 AM, Tim Clark wrote:


				Michael 
				
				
				The FuGE project may have some
interesting overlaps with SWAN.  Current phase of SWAN is focused on
construction of annotation and publishing tools for semantically
characterized hypotheses, claims, findings, counterclaims, etc on
digital resources in neuromedicine, at the community level.  This is
planned to be followed by a complementary phase involving management and
characterization of laboratory results using an extension of the same
ontology.  

				I propose we arrange mutual
presentations and discussions to see if any synergies exist such that we
might take advantage or each others' work. 

				Best

				Tim

				
	
------------------------------------------------------------------------
------
				Tim Clark 617-947-7098 (mobile)

				Director of Research Programs
				Harvard University Initiative in
Innovative Computing
				60 Oxford Street, Cambridge, MA 02138
				http://iic.harvard.edu

				Director of Informatics
				MassGeneral Institute for
Neurodegenerative Disease
				114 16th Street, Charlestown, MA 02129
				http://www.mindinformatics.org
	
------------------------------------------------------------------------
------



				On Jul 5, 2006, at 7:38 PM, Miller,
Michael D (Rosetta) wrote:


				Hi Eric,
				 
				Just wanted to point out how this
overlaps with the current FuGE (http://fuge.sourceforge.net/) and FUGO
(http://fugo.sourceforge.net/) efforts.  These are focused on systems
biology and are intended to provide the underpinnings of reporting gene
expression, gel, mass spec, and -omics experiments/investigations.
				 
				The goal of FuGE (Functional Genomic
Experiments) is for the most part to provide:
				 
				"a. Publishing Protocols 
				b. Publishing Regants and Products
				c. Stating the Hypothesis (and model
using RDF) that is being tested by the experiment; this includes which
citations are supportive or alternative to ones hypothesis
				d. Publishing Experimental Data
(possibly as RDF-OWL aggregates and tables)
				e. Articulating the Results and
Conclusions; specifically, whether the experiment refutes or supports
the central Hypothesis (most of us agree we cannot 'prove' a hypothesis,
only disprove it)"
				 
				But it is a UML based model that will
then have an equivalent XML Schema generated.  The advantage, I think,
this approach has over a pure ontology representation is that it better
captures the actual work-flow of these experiments for the interchange
of data and annotation.  That being said, the UML model incorporates a
way to annotate the class objects with ontology Individuals with a
reference to the Individual's RDF class and its ontology.  The UML model
adds the additional semantics of identifiers (typically expressed as
LSIDs) that allows tying reference elements generated in the XML Schema
to the full definition of an object.  So a biological sample can be
fully described in one document then referenced by a treatment that
incorporates it into a prep.
				 
				So, for instance, typically a hypothesis
is specific to the particular experiment/investigation.  In FuGE, it is
simply a Description class with a text attribute associated by a
Hypothesis association to the Investigation class.  But in the XML
document, this specific Description can be annotated by references to
ontologies that allow hypothesis to be translated to RDF upon import.
We used the OMG Ontology Definition Metamodel specification mapping of
Individuals from OWL/RDF to UML so that these could then be mapped back
to an OWL/RDF representation for reasoning
(http://www.omg.org/ontology/ontology_info.htm#RFIs,RFPs).
				 
				
				FUGO is intended to become part of the
OBO ontologies and FUGO's goal is to provide general annotation terms
for these type of experiments.
				 
				cheers,
				Michael
				Michael Miller 
				Lead Software Developer 
				Rosetta Biosoftware Business Unit 
				www.rosettabio.com 

				-----Original Message-----
				From:
public-semweb-lifesci-request@w3.org
[mailto:public-semweb-lifesci-request@w3.org] On Behalf Of Eric Neumann
				Sent: Monday, July 03, 2006 6:57 AM
				To: AJ Chen
				Cc: w3c semweb hcls
				Subject: Re: ontology specs for
self-publishing experiment
				
				

				
				
				AJ,
				
				
				This is a great start, and thanks for
taking this on! I would like to see this task force propose a conceptual
framework within the two months. It does not have to be final, but I
think we need to have others on the list review the ontologies
(http://esw.w3.org/topic/HCLS/ScientificPublishingTaskForce?action=Attac
hFile&do=get&target=SPE_Specs.html) and requirements
(http://esw.w3.org/topic/HCLS/SciPubSPERequirements) you have proposed,
ask questions about them, and adjust/expand as needed.
				
				
				I think there has been good discussions
on this topic in the past, and I would also refer folks to the SWAN
paper by Gao et al.  http://www.websemanticsjournal.org/ps/pub/2006-17 .
This work is inline with with what Tim Clark has been proposing to the
group, and I think it is a useful model to consider. Perhaps we can
combine these efforts and propose a workable (demo anyone?) by the end
of summer...
				
				
				In terms of gathering more Scientific
Publishing of Experiments (SPE) requirements, I wanted to list some
items that appear to be inter-related and relevant:
				
				
				1. By Publishing experiments, one must
also consider (i.e., include in the ontology):
				a. Publishing Protocols
				b. Publishing Regants and Products
				c. Stating the Hypothesis (and model
using RDF) that is being tested by the experiment; this includes which
citations are supportive or alternative to ones hypothesis
				d. Publishing Experimental Data
(possibly as RDF-OWL aggregates and tables)
				e. Articulating the Results and
Conclusions; specifically, whether the experiment refutes or supports
the central Hypothesis (most of us agree we cannot 'prove' a hypothesis,
only disprove it)
				
				
				2. Hypotheses should be defined in terms
of authorship (ala DC), what the proposed new concepts is, and what
(experimental) fact (or claim) is required to support it. It should also
refer to earlier hypotheses either by:
				a. extension of an earlier tested and
supported hypothesis: refinement
				b. similarity or congruence with another
untested hypothesis: supportive
				c. being an alternative to another
hypothesis, that will qualify itself through the refutation of the
earlier one: refutation
				This would allow one to define rules and
queries that can traverse the lineage of hypotheses (forwards and
backwards, similar to citations), and how one papers work can be related
to ongoing work on different fronts that have branched.
				
				
				3. "Publication" should be a specific
concept in SPE, that would serve to be the hub of DC metadata as well as
the above experimental data and hypotheses. Different non-disjoint
Publication "Roles" could be defined, such as  Peer-Reviewed,
Electronically-Published, Topic Review, and Follow-up Data. I would also
invite the folks interested in Clinical Publications to specify what
requirements they feel should be included, (e.g. regulatory
applications, Common Technical Document).
				
				
				I also think it would be useful if we
could add a Concept Map graphic for the proposed SPE ontology (class
relations mainly). Sometime ideas can get expressed faster to the larger
community using images.
				
				
				cheers,
				Eric





				From: AJ Chen <canovaj@gmail.com
<mailto:canovaj@gmail.com?Subject=Re%3A%20ontology%20specs%20for%20self-
publishing%20experiment&In-Reply-To=%253C70055a110606251600m469b7d63t405
579e7a61e7ef8%40mail.gmail.com%253E&References=%253C70055a110606251600m4
69b7d63t405579e7a61e7ef8%40mail.gmail.com%253E> > 
				Date: Sun, 25 Jun 2006 16:00:23 -0700
				Message-ID:
<70055a110606251600m469b7d63t405579e7a61e7ef8@mail.gmail.com> 
				To: public-semweb-lifesci@w3.org
<mailto:public-semweb-lifesci@w3.org?Subject=Re%3A%20ontology%20specs%20
for%20self-publishing%20experiment&In-Reply-To=%253C70055a110606251600m4
69b7d63t405579e7a61e7ef8%40mail.gmail.com%253E&References=%253C70055a110
606251600m469b7d63t405579e7a61e7ef8%40mail.gmail.com%253E>  
				I added the first draft of specs for the
ontology being developed for
				self-publishing experiment. see the link
on the task wiki page -
	
http://esw.w3.org/topic/HCLS/ScientificPublishingTaskForce
<http://esw.w3.org/topic/HCLS/ScientificPublishingTaskForce> 

				This specs document and the requiremnets
document are meant to be only the
				starting point for discussion.  I truly
hope more people in this group will
				participate in this open development
process, making comments or providing
				changes to the documents.

				While the ontology is being developed by
this community, I am going to
				develop a self-publishing tool that
implements the ontology, which allows
				you to try this new way of sharing
research information. With easy-to-use
				tools to demonstrate the benefits of
sharing and searching experiment
				information in semantic data format, it
will help attract more people to
				contribute to the development of the
ontology as well as the tools.

				Best,
				AJ




				Eric Neumann, PhD
				co-chair, W3C Healthcare and Life
Sciences,
				and Senior Director Product Strategy
				Teranode Corporation
				83 South King Street, Suite 800
				Seattle, WA 98104
				+1 (781)856-9132
				www.teranode.com 




			
			Bill Bug
			Senior Analyst/Ontological Engineer

			Laboratory for Bioimaging  & Anatomical
Informatics
			www.neuroterrain.org
			Department of Neurobiology & Anatomy
			Drexel University College of Medicine
			2900 Queen Lane
			Philadelphia, PA    19129
			215 991 8430 (ph)
			610 457 0443 (mobile)
			215 843 9367 (fax)


			Please Note: I now have a new email -
William.Bug@DrexelMed.edu




			This email and any accompanying attachments are
confidential. 
			This information is intended solely for the use
of the individual 
			to whom it is addressed. Any review, disclosure,
copying, 
			distribution, or use of this email communication
by others is strictly 
			prohibited. If you are not the intended
recipient please notify us 
			immediately by returning this message to the
sender and delete 
			all copies. Thank you for your cooperation.


	
	Bill Bug
	Senior Analyst/Ontological Engineer

	Laboratory for Bioimaging  & Anatomical Informatics
	www.neuroterrain.org
	Department of Neurobiology & Anatomy
	Drexel University College of Medicine
	2900 Queen Lane
	Philadelphia, PA    19129
	215 991 8430 (ph)
	610 457 0443 (mobile)
	215 843 9367 (fax)


	Please Note: I now have a new email - William.Bug@DrexelMed.edu




	
	This email and any accompanying attachments are confidential. 
	This information is intended solely for the use of the
individual 
	to whom it is addressed. Any review, disclosure, copying, 
	distribution, or use of this email communication by others is
strictly 
	prohibited. If you are not the intended recipient please notify
us 
	immediately by returning this message to the sender and delete 
	all copies. Thank you for your cooperation.

Received on Thursday, 6 July 2006 18:18:34 UTC