Re: ontology specs for self-publishing experiment from Alan Rector on 2006-07-08 (public-semweb-lifesci@w3.org from July 2006)

From: Alan Rector <rector@cs.man.ac.uk>
Date: Sat, 8 Jul 2006 20:57:15 +0200
To: William Bug <William.Bug@DrexelMed.edu>
Cc: "Miller, Michael D (Rosetta)" <Michael_Miller@Rosettabio.com>, Tim Clark <twclark@nmr.mgh.harvard.edu>, w3c semweb hcls <public-semweb-lifesci@w3.org>, SWAN Team <swan-team@mind-informatics.org>, Trish Whetzel <whetzel@pcbi.upenn.edu>, chris mungall <cjm@fruitfly.org>
Message-Id: <CBD398A0-9177-4FE0-BD73-CDF5E43282AD@cs.man.ac.uk>

On 6 Jul 2006, at 19:22, William Bug wrote:

>
> 	2) Doesn't this lead down a road similar to that of MIAME, only  
> now you've shifted the border of incommensurateness beyond the  
> level for data format and into the semantic domain?

Yes, but put another way, you have refactored the problem of  
"incommensurateness" into two more tractable pieces - one about the  
data structures to convey meaning, the other about the meanings  
conveyed.  You have also removed the risk of conflating the two  
problems thereby making both harder.  The UML/XML models are about  
conveying meanings; the ontologies are about the meanings conveyed.   
The constraints in the UML/XML models ensure that software can  
process the data structures correctly.  Violating such a constraint  
means that the structure is invalid.  The constraints in the ontology  
are about what we understand about the biology.  Violating a  
constraint in the ontology means that the meaning is incorrect or  
even inconsistent.  Getting that relationship between the data  
structures and meanings clearly defined is a key issue for many  
standardisation efforts.

In practice, the ontologies/terminologies/vocabularies are often  
maintained by different groups than the data structures/exchange  
formats and there are often requirements to use the same exchange  
format with different ontologies/terminologies and vice versa.  
(Analogous problems are common in the medical community.)

However, factoring the problem in this way does mean that you don't  
get full interoperability unless you agree on _both_ the data  
structures/exchange formats and the ontologies/terminologies.  (Or  
define  mappings and equivalences between them)

> What I mean is, won't there still be difficulty determining even  
> approximate semantic equivalency for all of the details of data  
> provenance - many of which absolutely must be resolved in order to  
> perform large-scale re-pooling of related observations made in the  
> context of different studies - even if nearly identical assays/ 
> instruments/reagents are used?

Yes.  There will always be a trade-off between the grounding cost of  
agreeing up front to use the same standards and the clean-up cost of  
resolving the differences later.  You can choose whether to pay for  
your lunch in advance or afterwards, but there is no free lunch.

It is a choice for the community - or communities - on how wide a  
consensus on the various issues they can achieve.

Regards

Alan

-----------------------
Alan Rector
Professor of Medical Informatics
School of Computer Science
University of Manchester
Manchester M13 9PL, UK
TEL +44 (0) 161 275 6149/6188
FAX +44 (0) 161 275 6204
www.cs.man.ac.uk/mig
www.clinical-esciences.org
www.co-ode.org

Received on Sunday, 9 July 2006 11:10:14 UTC