- From: Oliver Ruebenacker <curoli@gmail.com>
- Date: Mon, 30 Mar 2009 10:59:46 -0400
- To: Pat Hayes <phayes@ihmc.us>
- Cc: Matthias Samwald <samwald@gmx.at>, public-semweb-lifesci <public-semweb-lifesci@w3.org>
Hello Pat, All, On Sun, Mar 29, 2009 at 11:35 PM, Pat Hayes <phayes@ihmc.us> wrote: > On Mar 29, 2009, at 11:15 AM, Oliver Ruebenacker wrote: >> I am assuming that these classes all make a commitment about what >> their instances mean, so users could declares instances and rely on >> that commitment to be useful, right? > > As I have been taken to task (offline) for agreeing with you, allow me to > intercede. On this point, I think everyone is right. Yes, classes are things > that have, or can have, instances, and that is all that a class is, in > effect. (RDFS makes this quite explicit by _defining_ classes to be things > in the range of the rdf:type property.) On the other hand, it is certainly > correct that ontologies can say a lot about classes without ever mentioning > instances. On the other hand, it is also the case that, were someone to (not > unreasonably) wish to connect such classes with their instances, the > resulting conclusions should be correct, and if they were not, then this > would be a serious critique of the ontology. I have no idea what the person who took you to task was thinking, but it seems related to the ongoing controversy over whether substances should be instances or classes. In BioPAX, we have a class physical entity with subclasses such as protein. EGFR would be an instance of protein, and we could say in BioPAX that EGFR has a sequence. (I would argue it should rather say that EGFR matches a sequence pattern, but that is another story.) The problem is that there are certain assumptions which BioPAX users are encouraged to follow, such as (1) if two physical entities refer to the same record, they are identical (2) if two physical entities refer to different records in the same source, they are not identical (3) if they are not identical, they have no overlap. I don't think these assumptions are even asserted in the ontology or the documentation, but some BioPAX developers actively encourage users to rely on them ("Come on, they are true at least 95 percent of the time"), and BioPAX lacks support for cases where they break down. The fix I advocate is straight-forward: let the language be explicit about whether above assumptions are met or not and add support for cases where they are not. This would include a property that expresses that EGFR includes human EGFR, but since EGFR is not a class, it would not be owl:subClass. Others advocate a different approach: make all reference to substances references to classes, e.g. EGFR would be a subclass of protein. This, they say, is "more natural". The obvious benefit is that it makes it clear that two distinct substances may have overlap, since two distinct classes may have a non-empty intersection, e.g. human EGFR and phospho-EFR would have the intersection human phospho-EGFR (assuming EGFR to be defined to include phospho-EGFR). The obvious drawback is that everything becomes more complicated since instead of properties of instances we would have property restrictions over classes (e.g. instead of "EGFR matches EGFRSequence" we would say "Every element of EGFR matches EGFRSequence"). That alone is a serious issue, since typical users like it as simple as possible. But that is not the most serious issue. The most severe problem seems to be that the class approach seems to be incompatible with (1) observables being about statistical ensembles and (2) populations being defined by location, not individual membership - at least, if we want to avoid extreme complexity. (note: in what follows, all numbers are made up and probably not realistic) For example, how would we describe that "the concentration of ATP in the mitochondrion is (3.2 +/- 0.7) mol per liter"? What does the concentration inhere in? Maybe the concentration is just a proxy for the particle number? Say, the particle number of ATP in the mitochondrion is 24.7 +/- 1.6. What does that number inhere in? Can we restrict ourselves to cases of definite particle numbers? In Systems Biology, we often use differential equations to model how things change over time, and that assumes they change gradually. But nevertheless, let us say that the number of ATP in the mitochondrion is 23. What does that number inhere in? One particular set of 23 molecules? But are we talking only about one particular cell, or are we making a more general statement that applies to many cells? A new ATP molecule is created, increasing the number of ATP molecules to 24. The original set of 23 molecules still exists, and its number is still 23. But that's not the number of ATP molecules in the mitochondrion any more. Also, what happens when an ATP molecule is destroyed, or wanders off to some place else? Finally, what happens when the number of ATP molecules in the mitochondrion drops to zero? What does the zero inhere in - in the empty set? What if the number of ADP in the mitochondrion also drops to zero, does the zero also inhere in the empty set? How many empty sets are there? Maybe in principle, it is possible to reformulate the problem and build up a description from scratch, relying on terms such as molecule, that would allow to express the above scenarios accurately. But that approach would make a complex system of related restrictions necessary to make even the most simple assertions used in Systems Biology. What I think we need instead is a term that refers to "ATP in the mitochondrion", a term that refers to "(24.7 +/- 1.6)", and a simple property to connect these two in one statement. Take care Oliver -- Oliver Ruebenacker, Computational Cell Biologist BioPAX Integration at Virtual Cell (http://vcell.org/biopax) Center for Cell Analysis and Modeling http://www.oliver.curiousworld.org
Received on Monday, 30 March 2009 15:00:50 UTC