Re: blog: semantic dissonance in uniprot

On Mar 31, 2009, at 5:37 AM, Matthias Samwald wrote:

> Oliver,
>
> Indeed, ontologists seem to be more at home in the world of  
> 'discrete particulars'.

Umm.... I'm not sure what the Scientist People mean by this.  
Ontologies describe a world in which there are distinguishable  
entities between which relationships hold. But these entities can be  
_anything_. They can be, for example, substances, pieces of a  
substance, amounts of a substance, locations of a piece of a  
substance, mixtures of substances, time-periods during which a  
substance is being formed in a location, reactions which form a  
substance, rates of reactions which form a substance, categories by  
which a substance is classified (in various ways), and so on. And  
that's just talking about substances. (Oh, and I forgot: also  
molecules of a substance, if you like.) So that "discrete particular"  
covers a lot of ground.

Here's what ontology languages really do have trouble with: (1)  
statements about 'typical' or 'usual' or 'normal' entities, which are  
supposed to apply in most cases yet have exceptions which are not  
themselves precisely stated; (2) statements about probabilities,  
unless these are purely arithmetic assertions about numbers (therefore  
not much use) (3) statements involving very complex, detailed  
relationships between quantities, requiring more sophisticated  
mathematics than basic arithmetic to describe.

> However, I think you make ontology look more naive than it deserves.
> Of course, we cannot describe stochastic effects when our ontology  
> is only dealing with particular molecules (individuals) or molecules  
> in general (classes). These entities are parts/constituents of other  
> entities, and the stochastic properties / qualities only inhere in  
> those large entities, not in their basic building parts.
>
> To use a (still quite naive) physics example: 'Temperature' is a  
> quality of an object (say, a solution in a petri dish). This quality  
> only inheres in the solution, but not in a single molecule. If our  
> ontology would only contain the class 'molecule', but not the class  
> 'solution', then we would be unable to describe temperature in a  
> meaningful way. The same can be done for concentrations, of course.  
> Rates of change could be described as qualities of qualities

I think a quality of a process (= occurrent) would be more like it.  
This is fine in OBO.

> (I think the top-level ontology DOLCE allows this, but it would be  
> difficult in BFO, for example). Reaction equations describe  
> stochastic processes

Of course they do, but does _all_ reasoning about reactions require  
that we bear their stochastic nature in mind? Surely we often simply  
treat the reaction is a quantitatively restricted conversion event in  
which some substances are consumed and others are produced. That is  
perfectly describable in ontology languages. Textbook accounts of, for  
example, the Krebs cycle don't seem to refer to anything stochastic,  
or require references to probabilistic notions.

> , that's why you can have non-integer molecule numbers -- this can  
> be consistently described in OWL when you are aware that you are  
> describing reactions between pools of molecules, and not singular  
> individuals.

Or between amounts of a substance, with no reference to molecules at  
all. Quite detailed quantitative chemistry was being done when the  
atomic hypothesis was still being debated by physicists (as recently,  
it is easy to forget, as the early 20th century: the doubters were  
only finally silenced by Einstein's explanation of Brownian motion.)

> You can also say about a solution that a certain property of the  
> molecules is following a Gaussian distribution -- all of this would  
> be fine with most ontologists.

Well, as long as we aren't asked to delve too deeply into what exactly  
a Gaussian distribution is :-)

>
> The whole is more than the sum of its parts (as you wrote earlier),  
> because the whole can have qualities that its parts are lacking. OWL  
> can be used to describe the whole quite well.

Quite.

Pat

>
> Cheers,
> Matthias Samwald
>
> DERI Galway, Ireland
> http://deri.ie/
>
> Konrad Lorenz Institute for Evolution & Cognition Research, Austria
> http://kli.ac.at/
>
>
> --------------------------------------------------
> From: "Oliver Ruebenacker" <curoli@gmail.com>
> Sent: Monday, March 30, 2009 10:53 PM
> To: "Pat Hayes" <phayes@ihmc.us>
> Cc: "Matthias Samwald" <samwald@gmx.at>; "public-semweb-lifesci" <public-semweb-lifesci@w3.org 
> >
> Subject: Re: blog: semantic dissonance in uniprot
>
>>    Hello Pat, All,
>>
>> Let me try to take a step back and summarize what I think I learned  
>> so far:
>>
>> The Ontologists have gained impressive mastery over what I would
>> call the World of Discrete Particulars. They know how to deal with
>> particulars, classes of particulars and cardinality restrictions, in
>> other words, integers and intervals of integers.
>>
>> They know how to say, for example, that a particular petri dish has
>> 2000-3000 cells, each of which has 150-200 mitochondria, each of  
>> which
>> has 30-40 ATP molecules.
>>
>> However, the Science community typically deals with what I would
>> call the World of Reproducible Fluxes. They think in terms of
>> reproducible scenarios, expectation values, variances, continuous
>> change, Gaussian distribution and differential equations. (Literally,
>> these is what I learned in the first meeting of the Physics 101 class
>> when I went to College.)
>>
>> They know how to say, for example, that if you dump a typical human
>> liver cell into a 0.2 percent solution of some drug, the ATP
>> concentration in the mitochondria will drop by a relative rate of 1.2
>> percent per second during the first ten minutes.
>>
>> Asking a scientist to define "expectation value" is akin to asking
>> an Ontologist to define "class".
>>
>> When the Ontologists say, they have "solved mereology long time
>> ago", they seem to mean: In the World of Discrete Particulars. Not in
>> the World of Reproducible Fluxes.
>>
>> As it happens, Scientists discover OWL to produce ontologies that
>> live in the World of Reproducible Fluxes, such as BioPAX. Even
>> Scientists agree that it has weaknesses.
>>
>> Ontologist: BioPAX is nonsense! BioPAX is ill-defined! It is not an
>> ontology. It is not OWL.
>>
>> Scientist: I know it has weaknesses, but is it really that bad?
>>
>> Ontologist: Totally! We need to rebuild it from scratch.
>>
>> Scientist: OK, if you say so.
>>
>> (Ontologist goes and much later comes back with a new BioPAX, living
>> entirely in the World of Discrete Particulars)
>>
>> Ontologist: Here! A wonderful new BioPAX. Perfect ontology, perfect
>> OWL. Everything is clear and well-defined.
>>
>> Scientist: I am sorry, I can not use that.
>>
>> Ontologist: Why not?
>>
>> Scientist: It lacks the most basic terms Science is based on, such
>> as reproducible scenarios, expectation values and rate of change.
>>
>> Ontologist: You need to translate those terms into the World of
>> Discrete Particulars, and then I will include them.
>>
>> Scientist: How do I do that?
>>
>> Ontologist: I have no clue.
>>
>> But one thing both the Ontologist and the Scientist would agree:
>> Mereology in the World of Discrete Particulars is not Rocket Science.
>>
>>    Take care
>>    Oliver
>>
>> On Mon, Mar 30, 2009 at 12:35 PM, Pat Hayes <phayes@ihmc.us> wrote:
>>>
>>> On Mar 30, 2009, at 9:59 AM, Oliver Ruebenacker wrote:
>>>
>>>> Hello Pat, All,
>>>>
>>>> On Sun, Mar 29, 2009 at 11:35 PM, Pat Hayes <phayes@ihmc.us> wrote:
>>>>>
>>>>> On Mar 29, 2009, at 11:15 AM, Oliver Ruebenacker wrote:
>>>>>>
>>>>>> I am assuming that these classes all make a commitment about what
>>>>>> their instances mean, so users could declares instances and  
>>>>>> rely on
>>>>>> that commitment to be useful, right?
>>>>>
>>>>> As I have been taken to task (offline) for agreeing with you,  
>>>>> allow me to
>>>>> intercede. On this point, I think everyone is right. Yes,  
>>>>> classes are
>>>>> things
>>>>> that have, or can have, instances, and that is all that a class  
>>>>> is, in
>>>>> effect. (RDFS makes this quite explicit by _defining_ classes to  
>>>>> be
>>>>> things
>>>>> in the range of the rdf:type property.) On the other hand, it is
>>>>> certainly
>>>>> correct that ontologies can say a lot about classes without ever
>>>>> mentioning
>>>>> instances. On the other hand, it is also the case that, were  
>>>>> someone to
>>>>> (not
>>>>> unreasonably) wish to connect such classes with their instances,  
>>>>> the
>>>>> resulting conclusions should be correct, and if they were not,  
>>>>> then this
>>>>> would be a serious critique of the ontology.
>>>>
>>>> I have no idea what the person who took you to task was thinking,
>>>> but it seems related to the ongoing controversy over whether
>>>> substances should be instances or classes.
>>>
>>> Which seems like a much more interesting topic, indeed.
>>>
>>>> In BioPAX, we have a class physical entity with subclasses such as
>>>> protein. EGFR would be an instance of protein, and we could say in
>>>> BioPAX that EGFR has a sequence. (I would argue it should rather  
>>>> say
>>>> that EGFR matches a sequence pattern, but that is another story.)
>>>>
>>>> The problem is that there are certain assumptions which BioPAX  
>>>> users
>>>> are encouraged to follow, such as (1) if two physical entities  
>>>> refer
>>>> to the same record, they are identical (2) if two physical entities
>>>> refer to different records in the same source, they are not  
>>>> identical
>>>> (3) if they are not identical, they have no overlap.
>>>>
>>>> I don't think these assumptions are even asserted in the ontology  
>>>> or
>>>> the documentation, but some BioPAX developers actively encourage  
>>>> users
>>>> to rely on them ("Come on, they are true at least 95 percent of the
>>>> time"), and BioPAX lacks support for cases where they break down.
>>>>
>>>> The fix I advocate is straight-forward: let the language be  
>>>> explicit
>>>> about whether above assumptions are met or not and add support for
>>>> cases where they are not.
>>>
>>> Hard to disagree with that.
>>>
>>>> This would include a property that expresses
>>>> that EGFR includes human EGFR, but since EGFR is not a class, it  
>>>> would
>>>> not be owl:subClass.
>>>>
>>>> Others advocate a different approach: make all reference to
>>>> substances references to classes
>>>
>>> Classes of what? That is, what would be the ultimate elements of  
>>> these
>>> classes and subclasses? I think it is vital to get this straight  
>>> before
>>> proceeding. Possible answers include: molecules (so EGFR is the  
>>> class of all
>>> molecules that would be classified as an EGFR molecule); pieces of  
>>> 'stuff'
>>> in the mereological sense ("aggregates" as someone called them in  
>>> this
>>> thread); protein-types, where a type is something that can always be
>>> subdivided into subtypes according to some criterion, possibly one  
>>> yet to be
>>> discovered; kinds of substance, where a substance is something  
>>> that can
>>> partake in mixtures or compounds to create other kinds of  
>>> substance, and
>>> pieces of which occupy space. And no doubt there are others, also.  
>>> The
>>> point, I should perhaps emphasize, is not to refer to individuals  
>>> of these
>>> various kinds, but to pin down a particular way of thinking that  
>>> can be used
>>> consistently to justify ontological design decisions.
>>>
>>>> , e.g. EGFR would be a subclass of
>>>> protein. This, they say, is "more natural".
>>>
>>> It fits with the first model, above, in which we are always  
>>> talking about
>>> classes of molecule. Not so well with the 'substances' view.
>>>
>>>>
>>>> The obvious benefit is that it makes it clear that two distinct
>>>> substances may have overlap, since two distinct classes may have a
>>>> non-empty intersection, e.g. human EGFR and phospho-EFR would  
>>>> have the
>>>> intersection human phospho-EGFR (assuming EGFR to be defined to
>>>> include phospho-EGFR).
>>>
>>> That suggests the 'type/subtype" way of thinking.
>>>
>>>>
>>>> The obvious drawback is that everything becomes more complicated
>>>> since instead of properties of instances we would have property
>>>> restrictions over classes (e.g. instead of "EGFR matches  
>>>> EGFRSequence"
>>>> we would say "Every element of EGFR matches EGFRSequence"). That  
>>>> alone
>>>> is a serious issue, since typical users like it as simple as  
>>>> possible.
>>>
>>> I don't think this is a serious issue, in fact. It is as easy to  
>>> state the
>>> property restriction than the property, in OWL; and in any case,  
>>> simplicity
>>> in this sense is largely a matter of good human interface design,  
>>> and it is
>>> very bad engineering to base ontological decisions on interface  
>>> design.
>>>
>>>>
>>>> But that is not the most serious issue.
>>>
>>> OK
>>>
>>>>
>>>> The most severe problem seems to be that the class approach seems  
>>>> to
>>>> be incompatible with (1) observables being about statistical  
>>>> ensembles
>>>> and (2) populations being defined by location, not individual
>>>> membership - at least, if we want to avoid extreme complexity.
>>>>
>>>> (note: in what follows, all numbers are made up and probably not
>>>> realistic)
>>>>
>>>> For example, how would we describe that "the concentration of ATP  
>>>> in
>>>> the mitochondrion is (3.2 +/- 0.7) mol per liter"? What does the
>>>> concentration inhere in?
>>>
>>> Im not sure what this means. Are we talking about a particular
>>> mitochondrion, or mitochondria in general? I guess the latter. In  
>>> which
>>> case, the answer to the question is, it inheres in the class of
>>> Mitochondria, which is presumably a subclass of CellularStructures  
>>> or some
>>> such.
>>>
>>>>
>>>> Maybe the concentration is just a proxy for the particle number?
>>>> Say, the particle number of ATP in the mitochondrion is 24.7 +/-  
>>>> 1.6.
>>>> What does that number inhere in?
>>>
>>> Same answer, I guess. Though I don't know what a particle number  
>>> is, so this
>>> really is a guess.
>>>
>>>>
>>>> Can we restrict ourselves to cases of definite particle numbers? In
>>>> Systems Biology, we often use differential equations to model how
>>>> things change over time, and that assumes they change gradually.  
>>>> But
>>>> nevertheless, let us say that the number of ATP in the  
>>>> mitochondrion
>>>> is 23. What does that number inhere in? One particular set of 23
>>>> molecules? But are we talking only about one particular cell, or  
>>>> are
>>>> we making a more general statement that applies to many cells?
>>>
>>> I don't know, what are you wanting to say? General statements are  
>>> made (in
>>> OWL) by relating properties to classes (everything in this class  
>>> has this
>>> value of this property...)
>>>
>>>>
>>>> A new ATP molecule is created, increasing the number of ATP
>>>> molecules to 24. The original set of 23 molecules still exists, and
>>>> its number is still 23. But that's not the number of ATP  
>>>> molecules in
>>>> the mitochondrion any more. Also, what happens when an ATP  
>>>> molecule is
>>>> destroyed, or wanders off to some place else?
>>>
>>> Well, these are issues of describing change and time. That is a  
>>> whole
>>> ontological area that has been fairly extensively explored. But if  
>>> you want
>>> to be able to describe change and dynamics, you will have to  
>>> introduce time
>>> explicitly into your ontological framework one way or another.  
>>> There are no
>>> magic bullets for avoiding the resulting complications.
>>>
>>>>
>>>> Finally, what happens when the number of ATP molecules in the
>>>> mitochondrion drops to zero? What does the zero inhere in - in the
>>>> empty set?
>>>
>>> No, in the mitochondrion (or mitochondria) which have no ATP in  
>>> them. This
>>> is an old issue, thoroughly explored. (What kind of flock does a  
>>> shepherd
>>> have who has sold all his sheep?)
>>>
>>>> What if the number of ADP in the mitochondrion also drops
>>>> to zero, does the zero also inhere in the empty set? How many empty
>>>> sets are there?
>>>
>>> There is only one empty set. But in the example under discussion,  
>>> this would
>>> be an issue only if there were no mitochondria in the universe, a  
>>> case I
>>> assume we can safely ignore. (And, BTW, in many ontology languages  
>>> - though
>>> not, regrettably, OWL-DL - there can be a number of distinct empty
>>> _classes_.)
>>>
>>>>
>>>> Maybe in principle, it is possible to reformulate the problem and
>>>> build up a description from scratch, relying on terms such as
>>>> molecule, that would allow to express the above scenarios  
>>>> accurately.
>>>> But that approach would make a complex system of related  
>>>> restrictions
>>>> necessary to make even the most simple assertions used in Systems
>>>> Biology.
>>>
>>> Indeed, i suspect that Systems Biology would be an extremely complex
>>> ontology, if formalized adequately. (Even supposing the state of the
>>> formalizing art is up to the task, which I doubt.) Note however  
>>> that this
>>> does not mean that every assertion made using the concepts of the  
>>> ontology
>>> need be complex, only that the defining ontology for the concepts  
>>> will be.
>>> Fortunately, the defining ontology only has to be created once.
>>>
>>>>
>>>> What I think we need instead is a term that refers to "ATP in the
>>>> mitochondrion", a term that refers to "(24.7 +/- 1.6)", and a  
>>>> simple
>>>> property to connect these two in one statement.
>>>
>>> In other words, an equation without any definitions of the terms  
>>> used in it.
>>> Sure, go ahead, but please don't call it an ontology.
>>>
>>> Pat
>>>
>>>>
>>>> Take care
>>>> Oliver
>>>>
>>>> --
>>>> Oliver Ruebenacker, Computational Cell Biologist
>>>> BioPAX Integration at Virtual Cell (http://vcell.org/biopax)
>>>> Center for Cell Analysis and Modeling
>>>> http://www.oliver.curiousworld.org
>>>>
>>>>
>>>
>>> ------------------------------------------------------------
>>> IHMC (850)434 8903 or (650)494 3973
>>> 40 South Alcaniz St. (850)202 4416 office
>>> Pensacola (850)202 4440 fax
>>> FL 32502 (850)291 0667 mobile
>>> phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> -- 
>> Oliver Ruebenacker, Computational Cell Biologist
>> BioPAX Integration at Virtual Cell (http://vcell.org/biopax)
>> Center for Cell Analysis and Modeling
>> http://www.oliver.curiousworld.org
>
>
>
>

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes

Received on Wednesday, 1 April 2009 21:35:36 UTC