Re: Again about BioZen from samwald@gmx.at on 2006-10-16 (public-semweb-lifesci@w3.org from October 2006)

From: <samwald@gmx.at>
Date: Mon, 16 Oct 2006 13:38:22 +0200
To: biopax-discuss@cbio.mskcc.org, public-semweb-lifesci@w3.org
Message-ID: <20061016113822.254560@gmx.net>
I replied to this in a private mail to Andrea, but since Eric forwarded Andrea's mail to the public-semweb-lifesci list, I will also reply to it in public.


Andrea wrote:
> *) From your work on BioZen, how problematic do you see the mapping
> of BioPAX on an upper ontology like DOLCE ? At least for some core
> parts, this what BioZen has done (I mean, among other things). Did
> you find problems in this ?

Yes, the development of the ontology began with a mapping of BioPAX to DOLCE, which was not as easy as it might seem to be.
DOLCE is focused on describing entities in the real world, located in space and time. In the ontology, such entities are called spatio-temporal-particulars. BioPAX does not make such a commitment: depending on the interpretation of the user it could also be seen as describing data in databases, classes of entities or conceptualizations (or a mixture of all of these). Furthermore, it was also not clear (at least to me), if it describes events & single molecules or if it describes processes & populations of molecules. At first, I tried to represent the information as single events involving single molecules, but it soon turned out that this was not practicable and did not represent what is really happening in the real world (stoichiometric processes / emergent properties of larger populations of molecules).
Another problem with BioPAX is the distinction between 'physical entity' and 'physical entity participant'. Instances of the first class act as some kind of blueprint that can be 'instantiated' with entities of the second class. This led to many misunderstandings in the BioPAX community and also proved to be problematic for the mapping to DOLCE.



> *) In population of molecules, dol:part seems to refer both to part-
>  of populations (meaning: subsets of individuals of the
> populations) and to part-of individuals population-wide (part  of
> each molecule across the whole population). To me, these two cases
> seems to be associated to two different semantics, expecially if I
> think how qualities of a populations affects its parts. But I'm
> sure this is already covered in the DOLCE framework. Can you
> elaborate more on this ?

Yes, these are two kinds of 'part of', but in my opinion the generic dol:part relation is sufficient (dol:part can be seen as a superproperty of these two 'part of's you are referring to). All of the necessary information is already covered by the type of subject and object. Defining additional properties would just cause redundancy, increase complexity of the ontology and would make it more difficult to write queries.



> *) If I understand it well (but maybe I don't), a molecular
> population is characterized by it's location. What if I want to
> refer to a molecule population, independently of its location ?

Then you don't make a statement about the location. Open world assumption -> it can be located anywhere. However, to be classified as a molecular population, an entity must have at least some spatially defined boundary (e.g. being located in the cytosol of one cell). You do not need to make statements about the location, though.



> *) <described-by> concept, isn't this too loose semantically ? Ok,
> I understand the reasons to keep it simple. But I can say
> insulin123 described-by insulin and something like diabetes
> described-by insulin (ok, this is a little but stretched, but...
> it's anyway described-by if this not more specified). Do you think
> described-by may be further specified (for example: "characterized
> by", "annotated by"...) ?

This property is kept loose deliberately, in order to 'humanize' descriptions in a clearly delimited area of the model. While the rest of the ontology is optimized for ontological consistency, the constructs around concept annotations are kept very flexible and simple.

"insulin123 described-by insulin" is an example for good use of concept annotations, while "diabetes described-by insulin" is a bad example. However, both statements are possible. To optimize the use of concept annotations in bio-zen there is a simple rule of thumb. From the bio-zen manual: Express as much as possible in the world of spatio-temporal-particulars and not in the world of abstract concepts.

Example:
You are describing a protein with enzymatic function that is important in the process of glial cell differentiation. You want to annotate its function with a concept of GO (e.g. the molecular-process-concept "GO_0010001: glial cell differentiation"). The simplest way to do this would be to just say

<protein-population-123> <described-by> <GO_0010001> 

This might be very brief, but not very elegant, as we have described a population of molecules with a concept that actually refers to a process, not a molecule. Of course, most people will understand what is implied in this statement, but it is still preferable to be a little more precise.
To add some precision to our statement, we could make the explicit statement that protein-population-123 participates in a process. Then we could annotate this process with the concept for "glial cell differentiation". The new version of our statement would therefore be:

<process-123> <participant> <protein-population-123> 
<process-123> <described-by> <GO_0010001> 

Such concise descriptions should be preferred wherever possible. In the case of your example, we could describe diabetes as a process and insulin as a participant in this process.



> *) Where do you need Correlates-A,B... ? And by the way, I guess
> this is to state some correspondence with a semantics implicitely
> encoded in the uri-string. If so, isn't this a little dirty ?
> Anyway, can you provide an example of a description of correlation
> that uses these properties ?

It is used for mathematical descriptions. A, B etc. can be used as variables in a MathML description to describe the correlation with a mathematical equation. The use of these constructs is exemplified on page 14 of the current bio-zen manual [1]. The '-A', '-B' postfix is solely used to distinguish different groups of correlated qualities and does not have any additional significance or meaning. I think there is no other, simple way to implement this in RDF.



> *) Quick dumb question: does causation implies a relation in time ?

No, temporal relations have to be stated separately.



> *) on fuzzness.  What do you mean by realness ?
> Like in: John is a thief with belief 0.7 (He is or he is not a
> thief, holds) Or
> John is old with belief 0.7 (I know exactly the age, it's the
> concept of old that's vague).

The examples you give here are triples. In bio-zen, triples cannot be 'fuzzified' (this would require RDF reification). Instead, the EXISTANCE of entities is fuzzified. E.g. the organism "Albert Einstein" would have a realness of 1, because we are really sure that he existed, while the organism "The last Unicorn" would have a realness of 0, because we are pretty sure that unicorns never existed and the description of a Unicorn is not helpful for our understanding of the real world. Ok, that was a daft example.



> *) As for fuzziness, as well as evolution of description models,
> this should be in some underlying level, with provenance, trust,
> versioning, dependencies and so on... or not ?

What do you mean with "underlying level"? I don't want to use named graphs or any other construct "outside" the world of the normal OWL graph.
As for provenance, trust, dependencies and so on: These are really complex issues which should be tackled by others (preferably a W3C working group). It would not make sense to develop something like this by myself.

By the way, version 1.0 of the bio-zen ontology will soon be released (this week). This will be the first release intended for practical use.

Kind regards,
Matthias Samwald


[1] http://neuroscientific.net/res/semsyn/biozen-manual-jul-06.pdf






.
-- 
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! 
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
Received on Monday, 16 October 2006 11:38:39 UTC