RE: ontology specs for self-publishing experiment from Miller, Michael D (Rosetta) on 2006-07-10 (public-semweb-lifesci@w3.org from July 2006)

From: Miller, Michael D (Rosetta) <Michael_Miller@Rosettabio.com>
Date: Mon, 10 Jul 2006 09:00:12 -0700
To: "Alan Rector" <rector@cs.man.ac.uk>, "William Bug" <William.Bug@DrexelMed.edu>
cc: "w3c semweb hcls" <public-semweb-lifesci@w3.org>, "SWAN Team" <swan-team@mind-informatics.org>, "Marco Brandizi" <brandizi@ebi.ac.uk>
Message-ID: <E1FzyBM-0006G0-DV@lisa.w3.org>
Hi All,

> Yes, but put another way, you have refactored the problem of  
> "incommensurateness" into two more tractable pieces - one about the  
> data structures to convey meaning, the other about the meanings  
> conveyed.  You have also removed the risk of conflating the two ... 

Thanks, Alan, this conveys much better what I was trying to say.

What I would add is that "the data structures to convey meaning" are
mostly those objects that are unique to an investigation.  These
instances would likely, in themselves, not have much worth in the
translation to RDF and analysis by semantic web tools.

But their annotations, on the other hand, would.  If I understand what
Marco is up to at the EBI (and I'm likely getting the terminology
wrong), is for a particular gene expression experiment deposited in
ArrayExpress, forming triples based on the MAGE class to the ontology
annotations and going from there.

On another related thread from Phil,

"My point is that XSLT is not
good for operating on RDF because there are many syntactic ways of
representing the same thing. In general, I wouldn't use XSLT at all as
I hate it, but that's a different issue."

In general, we've found that importing the contents of the MAGEv1
documents to our application and then operating on the contents within
the application to be much preferable to dealing with the XML directly.

The good news for the semantic web effort is that there are many
applications that are making active use of the GO ontology, currently in
an ad hoc manner for the most part.  In terms of an import from a MAGE
document, one can find the genes of interest and based on their
identifiers get their associated GO terms (which did not have to be in
the MAGE document itself, only its GENBANK or similar identifier). Then
one can take a GO annotated version of BioPathways and take 
these Genes of interest and map onto the pathways via the GO terms and
so on.

And, if one looks at the experiments at ArrayExpress, there is a lot of
annotation for the BioMaterials and for the Experiments that isn't being
exploited yet but could easily to look for interesting matches today
between ArrayExpress, GEO, NCI and other repositories.

cheers,
Michael

> -----Original Message-----
> From: Alan Rector [mailto:rector@cs.man.ac.uk] 
> Sent: Saturday, July 08, 2006 11:57 AM
> To: William Bug
> Cc: Miller, Michael D (Rosetta); Tim Clark; w3c semweb hcls; 
> SWAN Team; Trish Whetzel; chris mungall
> Subject: Re: ontology specs for self-publishing experiment
> 
> 
> 
> On 6 Jul 2006, at 19:22, William Bug wrote:
> 
> >
> > 	2) Doesn't this lead down a road similar to that of 
> MIAME, only  
> > now you've shifted the border of incommensurateness beyond the  
> > level for data format and into the semantic domain?
> 
> Yes, but put another way, you have refactored the problem of  
> "incommensurateness" into two more tractable pieces - one about the  
> data structures to convey meaning, the other about the meanings  
> conveyed.  You have also removed the risk of conflating the two  
> problems thereby making both harder.  The UML/XML models are about  
> conveying meanings; the ontologies are about the meanings conveyed.   
> The constraints in the UML/XML models ensure that software can  
> process the data structures correctly.  Violating such a constraint  
> means that the structure is invalid.  The constraints in the 
> ontology  
> are about what we understand about the biology.  Violating a  
> constraint in the ontology means that the meaning is incorrect or  
> even inconsistent.  Getting that relationship between the data  
> structures and meanings clearly defined is a key issue for many  
> standardisation efforts.
> 
> In practice, the ontologies/terminologies/vocabularies are often  
> maintained by different groups than the data structures/exchange  
> formats and there are often requirements to use the same exchange  
> format with different ontologies/terminologies and vice versa.  
> (Analogous problems are common in the medical community.)
> 
> However, factoring the problem in this way does mean that you don't  
> get full interoperability unless you agree on _both_ the data  
> structures/exchange formats and the ontologies/terminologies.  (Or  
> define  mappings and equivalences between them)
> 
> > What I mean is, won't there still be difficulty determining even  
> > approximate semantic equivalency for all of the details of data  
> > provenance - many of which absolutely must be resolved in order to  
> > perform large-scale re-pooling of related observations made in the  
> > context of different studies - even if nearly identical assays/ 
> > instruments/reagents are used?
> 
> Yes.  There will always be a trade-off between the grounding cost of  
> agreeing up front to use the same standards and the clean-up cost of  
> resolving the differences later.  You can choose whether to pay for  
> your lunch in advance or afterwards, but there is no free lunch.
> 
> It is a choice for the community - or communities - on how wide a  
> consensus on the various issues they can achieve.
> 
> Regards
> 
> Alan
> 
> 
> -----------------------
> Alan Rector
> Professor of Medical Informatics
> School of Computer Science
> University of Manchester
> Manchester M13 9PL, UK
> TEL +44 (0) 161 275 6149/6188
> FAX +44 (0) 161 275 6204
> www.cs.man.ac.uk/mig
> www.clinical-esciences.org
> www.co-ode.org
> 
> 
> 
>
Received on Monday, 10 July 2006 16:00:41 UTC