Re: blog: semantic dissonance in uniprot

On Mar 30, 2009, at 9:59 AM, Oliver Ruebenacker wrote:

>     Hello Pat, All,
>
> On Sun, Mar 29, 2009 at 11:35 PM, Pat Hayes <phayes@ihmc.us> wrote:
>> On Mar 29, 2009, at 11:15 AM, Oliver Ruebenacker wrote:
>>>  I am assuming that these classes all make a commitment about what
>>> their instances mean, so users could declares instances and rely on
>>> that commitment to be useful, right?
>>
>> As I have been taken to task (offline) for agreeing with you, allow  
>> me to
>> intercede. On this point, I think everyone is right. Yes, classes  
>> are things
>> that have, or can have, instances, and that is all that a class is,  
>> in
>> effect. (RDFS makes this quite explicit by _defining_ classes to be  
>> things
>> in the range of the rdf:type property.) On the other hand, it is  
>> certainly
>> correct that ontologies can say a lot about classes without ever  
>> mentioning
>> instances. On the other hand, it is also the case that, were  
>> someone to (not
>> unreasonably) wish to connect such classes with their instances, the
>> resulting conclusions should be correct, and if they were not, then  
>> this
>> would be a serious critique of the ontology.
>
>  I have no idea what the person who took you to task was thinking,
> but it seems related to the ongoing controversy over whether
> substances should be instances or classes.

Which seems like a much more interesting topic, indeed.

>  In BioPAX, we have a class physical entity with subclasses such as
> protein. EGFR would be an instance of protein, and we could say in
> BioPAX that EGFR has a sequence. (I would argue it should rather say
> that EGFR matches a sequence pattern, but that is another story.)
>
>  The problem is that there are certain assumptions which BioPAX users
> are encouraged to follow, such as (1) if two physical entities refer
> to the same record, they are identical (2) if two physical entities
> refer to different records in the same source, they are not identical
> (3) if they are not identical, they have no overlap.
>
>  I don't think these assumptions are even asserted in the ontology or
> the documentation, but some BioPAX developers actively encourage users
> to rely on them ("Come on, they are true at least 95 percent of the
> time"), and BioPAX lacks support for cases where they break down.
>
>  The fix I advocate is straight-forward: let the language be explicit
> about whether above assumptions are met or not and add support for
> cases where they are not.

Hard to disagree with that.

> This would include a property that expresses
> that EGFR includes human EGFR, but since EGFR is not a class, it would
> not be owl:subClass.
>
>  Others advocate a different approach: make all reference to
> substances references to classes

Classes of what? That is, what would be the ultimate elements of these  
classes and subclasses? I think it is vital to get this straight  
before proceeding. Possible answers include: molecules (so EGFR is the  
class of all molecules that would be classified as an EGFR molecule);  
pieces of 'stuff' in the mereological sense ("aggregates" as someone  
called them in this thread); protein-types, where a type is something  
that can always be subdivided into subtypes according to some  
criterion, possibly one yet to be discovered; kinds of substance,  
where a substance is something that can partake in mixtures or  
compounds to create other kinds of substance, and pieces of which  
occupy space. And no doubt there are others, also. The point, I should  
perhaps emphasize, is not to refer to individuals of these various  
kinds, but to pin down a particular way of thinking that can be used  
consistently to justify ontological design decisions.

> , e.g. EGFR would be a subclass of
> protein. This, they say, is "more natural".

It fits with the first model, above, in which we are always talking  
about classes of molecule. Not so well with the 'substances' view.

>
>  The obvious benefit is that it makes it clear that two distinct
> substances may have overlap, since two distinct classes may have a
> non-empty intersection, e.g. human EGFR and phospho-EFR would have the
> intersection human phospho-EGFR (assuming EGFR to be defined to
> include phospho-EGFR).

That suggests the 'type/subtype" way of thinking.

>
>  The obvious drawback is that everything becomes more complicated
> since instead of properties of instances we would have property
> restrictions over classes (e.g. instead of "EGFR matches EGFRSequence"
> we would say "Every element of EGFR matches EGFRSequence"). That alone
> is a serious issue, since typical users like it as simple as possible.

I don't think this is a serious issue, in fact. It is as easy to state  
the property restriction than the property, in OWL; and in any case,  
simplicity in this sense is largely a matter of good human interface  
design, and it is very bad engineering to base ontological decisions  
on interface design.

>
>  But that is not the most serious issue.

OK

>
>  The most severe problem seems to be that the class approach seems to
> be incompatible with (1) observables being about statistical ensembles
> and (2) populations being defined by location, not individual
> membership - at least, if we want to avoid extreme complexity.
>
>  (note: in what follows, all numbers are made up and probably not  
> realistic)
>
>  For example, how would we describe that "the concentration of ATP in
> the mitochondrion is (3.2 +/- 0.7) mol per liter"? What does the
> concentration inhere in?

Im not sure what this means. Are we talking about a particular  
mitochondrion, or mitochondria in general? I guess the latter. In  
which case, the answer to the question is, it inheres in the class of  
Mitochondria, which is presumably a subclass of CellularStructures or  
some such.

>
>  Maybe the concentration is just a proxy for the particle number?
> Say, the particle number of ATP in the mitochondrion is 24.7 +/- 1.6.
> What does that number inhere in?

Same answer, I guess. Though I don't know what a particle number is,  
so this really is a guess.

>
>  Can we restrict ourselves to cases of definite particle numbers? In
> Systems Biology, we often use differential equations to model how
> things change over time, and that assumes they change gradually. But
> nevertheless, let us say that the number of ATP in the mitochondrion
> is 23. What does that number inhere in? One particular set of 23
> molecules? But are we talking only about one particular cell, or are
> we making a more general statement that applies to many cells?

I don't know, what are you wanting to say? General statements are made  
(in OWL) by relating properties to classes (everything in this class  
has this value of this property...)

>
>  A new ATP molecule is created, increasing the number of ATP
> molecules to 24. The original set of 23 molecules still exists, and
> its number is still 23. But that's not the number of ATP molecules in
> the mitochondrion any more. Also, what happens when an ATP molecule is
> destroyed, or wanders off to some place else?

Well, these are issues of describing change and time. That is a whole  
ontological area that has been fairly extensively explored. But if you  
want to be able to describe change and dynamics, you will have to  
introduce time explicitly into your ontological framework one way or  
another. There are no magic bullets for avoiding the resulting  
complications.

>
>  Finally, what happens when the number of ATP molecules in the
> mitochondrion drops to zero? What does the zero inhere in - in the
> empty set?

No, in the mitochondrion (or mitochondria) which have no ATP in them.  
This is an old issue, thoroughly explored. (What kind of flock does a  
shepherd have who has sold all his sheep?)

> What if the number of ADP in the mitochondrion also drops
> to zero, does the zero also inhere in the empty set? How many empty
> sets are there?

There is only one empty set. But in the example under discussion, this  
would be an issue only if there were no mitochondria in the universe,  
a case I assume we can safely ignore. (And, BTW, in many ontology  
languages - though not, regrettably, OWL-DL - there can be a number of  
distinct empty _classes_.)

>
>  Maybe in principle, it is possible to reformulate the problem and
> build up a description from scratch, relying on terms such as
> molecule, that would allow to express the above scenarios accurately.
> But that approach would make a complex system of related restrictions
> necessary to make even the most simple assertions used in Systems
> Biology.

Indeed, i suspect that Systems Biology would be an extremely complex  
ontology, if formalized adequately. (Even supposing the state of the  
formalizing art is up to the task, which I doubt.) Note however that  
this does not mean that every assertion made using the concepts of the  
ontology need be complex, only that the defining ontology for the  
concepts will be. Fortunately, the defining ontology only has to be  
created once.

>
>  What I think we need instead is a term that refers to "ATP in the
> mitochondrion", a term that refers to "(24.7 +/- 1.6)", and a simple
> property to connect these two in one statement.

In other words, an equation without any definitions of the terms used  
in it. Sure, go ahead, but please don't call it an ontology.

Pat

>
>     Take care
>     Oliver
>
> -- 
> Oliver Ruebenacker, Computational Cell Biologist
> BioPAX Integration at Virtual Cell (http://vcell.org/biopax)
> Center for Cell Analysis and Modeling
> http://www.oliver.curiousworld.org
>
>

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes

Received on Monday, 30 March 2009 16:36:29 UTC