FW: [BioPAX-discuss] RE: xml schema for BioPAX

Cross-posting this message...

-----Original Message-----
From: biopax-discuss-bounces@biopax.org
[mailto:biopax-discuss-bounces@biopax.org]On Behalf Of
Eric.Neumann@aventis.com
Sent: Monday, August 09, 2004 11:03 AM
To: pm286@cam.ac.uk; biopax-discuss@biopax.org
Subject: RE: [BioPAX-discuss] RE: xml schema for BioPAX



This is a great discussion, and it may have an impact on other areas of life science data exchange. I think the relation between OWL/RDF and XML Schema is a critical one for us all to better comprehend.

The key question, I believe, is around the need for expressivity: how many descriptors and relations do I need to use to describe just one, context-specific pathway in mouse, say WNT4? How do I ensure to get my information (and points-of-view) across to another researcher?

If I can regulary pack the data/info into something similar to a microarray data set, then why not define an xml schema and use vanilla xml? If I need to add layers of relations to pathway elements, regulators, modulators, conditions, then I'd rather use RDF for the instance data. It parses just fine into triples with not too much depth.

FWIW, I have found using RDF (graph) for data instances way easier than the syntax restrictions (vs. tree) within a xml-schema. For those not comfortable in "processing" RDF (don't base your opinion on trying reading RDF by eye), I suggest trying out JENA or CWM to see what is possible in this space. Quoting a friend from the Whitehead, "Once you've experienced XML hell, you'll understand".

Eric

-----Original Message-----
From: biopax-discuss-bounces@biopax.org
[mailto:biopax-discuss-bounces@biopax.org]On Behalf Of Peter Murray-Rust
Sent: Monday, August 09, 2004 7:37 AM
To: biopax-discuss@biopax.org
Subject: RE: [BioPAX-discuss] RE: xml schema for BioPAX


At 17:06 06/08/2004 -0400, Gary Bader wrote:
>Hi Chris,
>         That is correct.  There is no XML Schema for BioPAX, only an OWL
>definition.  Both OWL and XML Schema are XML standards for representing
>information recommended by the W3C.  The main difference between XML Schema
>and OWL is that OWL allows definition of a class hierarchy, where XML Schema
>does not.  OWL has some other unique features as well compared to XML Schema
>(e.g. ability to say that one class is disjoint from another), but BioPAX
>does not make use of those.  This means that XML Schema tools, like Castor
>and JAXB will not work with OWL, but the Jena library replicates much of
>this functionality, just in a different manner.
>         The choice of using OWL was decided by a vote in the core group
>early on in BioPAX discussions.
>
>Best,
>Gary

CML (Chemical Markup Language) is part of the BioPAX system and is firmly 
based on XSD Schema. I don't see XSD and OWL as being exclusive and hope 
that they will interoperate. Indeed I am keen to see how RDF/OWL might be 
"layered" on CML - there is a lot of validation that cannot be provided by 
Schemas.

CML represents a set of (hopefully) well-understood information objects for 
which  much semantics depends on algorithms. Thus to calculate the 
frequencies of a transition state a matrix needs to be inverted and it is 
more practical to map this onto Java classes. We have developed about 100 
schema elements (not all are required by BioPAX) and these are transformed 
algorithmically into Java (we actually wrote our own, rather than using 
JAXB, Castor, etc. as we also have to generate FORTRAN, Python and C++). 
The functionality of a schema is mainly get and set, so we have also 
handcrafted a set of Tools which wrap the schema objects and provide a 
large set of chemical functions. An example (paraphrased) might be:

MoleculeTool mt = new MoleculeTool(molecule);
AtomSetTool[] rings = mt.getRingNuclei();

(These tools are available as Open Source - http://wwmm.ch.cam.ac.uk/moin)
Note - CML now includes CMLReact which has been extensively tested on 
enzyme reactions (by Gemma Holliday) and which may be of interest in BioPAX 
wants to hold details of reactants, mechanisms, transition states, etc.

However there are many cases where it would be useful to reason. Examples 
can be:

"the formula deduced from the connection table should be consistent with 
that reported by the depositor"
"The mass and charge difference in a reaction should be zero"

It looks attractive to model these by OWL, but it may need to use 
primitives to call CML algorithmic functionality. Does this look a useful 
and practical approach. Perhaps RDF can be used to locate resources which 
apply these functions

P.

Peter Murray-Rust
Unilever Centre for Molecular Informatics
Chemistry Department, Cambridge University
Lensfield Road, CAMBRIDGE, CB2 1EW, UK
Tel: +44-1223-763069

_______________________________________________
BioPAX-discuss mailing list
BioPAX-discuss@biopax.org
http://www.biopax.org/mailman/listinfo/biopax-discuss
_______________________________________________
BioPAX-discuss mailing list
BioPAX-discuss@biopax.org
http://www.biopax.org/mailman/listinfo/biopax-discuss

Received on Monday, 9 August 2004 15:08:32 UTC