Re: how to deal with different requirements for experiment self-publishing from William Bug on 2006-07-08 (public-semweb-lifesci@w3.org from July 2006)

From: William Bug <William.Bug@DrexelMed.edu>
Date: Fri, 7 Jul 2006 20:20:40 -0400
To: AJ Chen <canovaj@gmail.com>
Cc: public-semweb-lifesci@w3.org
Message-Id: <F3790982-95CE-4631-AC31-B2DFBEC2B684@DrexelMed.edu>
Thanks again AJ for the work you are doing.

I think you are correct here - it can be useful to provide a  
classification scheme defining the variety of approaches we can  
envision researchers will use to formally specify experiment-related  
semantic information.

Just to be clear - the terms I presented represented an embellishment  
of the entities BIRN has been compiling - built as sub-classes of  
FuGO entities or the foundational ontology on which the current FuGO  
is built (BFO/OBR).  Again - that "messy" graph with mixed relation  
types was really meant to just provide a "snapshot" of the sort of  
granularity that is currently being specified by community ontology  
efforts targeting experiment reporting semantic specification. As I  
mentioned, the were intended just as a "straw man" to spur discussion.

I think your breakdown of three general categories can be amended a  
bit.  I would add the approach Michael has been describing, where  
someone would use a formal DOM specification such as FuGE to specify  
the details of an an experiment - and may or may not refer to well- 
founded entities from an ontology in populating XML instances of that  
model.

So in the context of the effort you have been working on here, I see  
two major categories of formal experiment/investigation-related  
specifications, the latter having a collection of sub-types:

1) Structured format with no guaranteed ontological commitment (e.g.,  
FuGE)
	* can be represented in RDF or according to an XML XSD
	* accepts the implied semantic relations of the document object model
		** because of this, it should be possible to use XSLT to  
deterministically translate between one XSD to another, including  
translation to RDF (a special case of the Philip Lord's caveat "it  
depends" where XML --> RDF translation should be possible via XSLT  
across the two XSDs; as both Philip & Chris points out, this is not  
an invertible function - you cannot expect to go from RDF --> the  
original XML, without a lot of manual QC)
	* may or may not use well-founded, ontological entities in  
specifying experiment details

2) Structured format with required ontological commitment
	* represented using RDF-only (meaning all resources will be URI-based)
	* requires use of SOME well-founded ontology for SOME aspect of  
experiment-related reporting
	
	a) Coarse-level, comprehensive semantic specification
		* uses SOME ontology(s) to specify coarse-granularity for ALL  
entities related to experiment reporting, such as the one you've been  
assembling
			
	b) Fine-level, partial semantic specification
		* uses SOME ontology(s) to specify SOME entities related to  
experiment reporting covered to fine-level granularity.

	c) Coarse-level, comprehensive semantic specification - some domains  
covered to fine-level granularity
		* fulfills requirements of both 'a' & 'b' above

	d) Fine-level, comprehensive semantic specification
		* uses SOME ontology(s) to specify ALL entities related to  
experiment reporting with ALL being covered to fine-level granularity.


I expect it will be at least 10 years from now before it will be  
practical for any project to get to '2.d' - something very close to  
it.  Reasonable folks may disagree on whether it will ever be possible.

I agree there is value to derive from pursuing '2.a' - much of which  
is touched on in the SWAN article by Gao Yong, et al. that Eric N.  
cited. The HCLSIG working groups can provide real value to the  
community by providing a demo of this using SemWeb technology.

Just to be clear, BIRN is currently working toward '2.b'.  The  
centralized infrastructural development and support provided as a  
part of the BIRN Project really sets us apart as a special case,  
where it is at least tractable for us to pursue this path.  The  
ontologies we currently intend to use are the OBO Foundry ontologies,  
including FuGO, PaTO and others.  When a domain we require is not  
covered, we are seeking to work with the appropriate community  
efforts to develop the required ontology.  When granularity of  
coverage for existing ontologies doesn't meet our needs, we also  
expect to fill in subclasses. We are starting by focusing on 'is_a'  
relations, though fully intend to use the variety of relations  
defined in the OBO Relation Ontology to fill out the complexity of  
the BIRN formal semantic framework.  As I understand it, the OBO  
Relation Ontology will itself continue to evolve to expand the  
specificity and range of relations it can cover.  In all cases, we  
currently plan to follow the OBO Foundry principles as we proceed  
along this path.

Depending on what we learn from producing a demo of '2.a' here  
amongst HCLSIG participants, I think BIRN would be very receptive to  
the idea of proceeding to '2.c'.

Cheers,
Bill

On Jul 7, 2006, at 3:42 AM, AJ Chen wrote:

> All,
> >From the discussions so far, I see a whole spectrum of needs for  
> publishing experiment information.  On one end, some researchers  
> want a quick and easy way to share an experiment, e.g. simply  
> decompose an experiment to hypothesis, data, results, procedure,  
> protocols used, who did it, what project it belongs to, etc. On the  
> other end of the spectrum, some researchers want to describe it  
> with domain-specific terms as detailed as possible, e.g using FuGO  
> or BioPAX terms.  In the middle of the spectrum,  one may want to  
> describe an experiment in general terms but with great details,  
> e.g. using the terms Bill Bug provided from BIRN.
>
> Because of this diversity of requirements, I think it is not  
> realistic to expect one huge ontology will fit all. I would suggest  
> we think of this task in terms of multiple phases so that  
> incremental progress can be made within short time frame. In the  
> first phase (current phase), we focus on a small ontology that can  
> be used to develop quick and easy tools for self-publishing.  In  
> the next phase, we can add more granularity to it. In the third  
> phase, we may figure out how to bridge this general-purpose  
> ontology to domain-specific ontologies that are developed by other  
> groups.  An alternative approach is to have separate tasks to meet  
> different requirements at the same time.
>
> What do you think? If we take the multi-phase approach, I would  
> suggest further discussions to be focused on the objective of the  
> current phase, i.e. a small and simple ontology.  If anyone likes  
> the multi-task approach, please consider to propose a new task.
>
>
> AJ
>
>

Bill Bug
Senior Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)


Please Note: I now have a new email - William.Bug@DrexelMed.edu







This email and any accompanying attachments are confidential. 
This information is intended solely for the use of the individual 
to whom it is addressed. Any review, disclosure, copying, 
distribution, or use of this email communication by others is strictly 
prohibited. If you are not the intended recipient please notify us 
immediately by returning this message to the sender and delete 
all copies. Thank you for your cooperation.
Received on Saturday, 8 July 2006 00:20:53 UTC