Re: blog: semantic dissonance in uniprot from John F. Madden on 2009-03-26 (public-semweb-lifesci@w3.org from March 2009)

From: John F. Madden <john.madden@duke.edu>
Date: Thu, 26 Mar 2009 09:31:24 -0700
To: Pat Hayes <phayes@ihmc.us>
Cc: Michel_Dumontier <Michel_Dumontier@carleton.ca>, W3C HCLSIG hcls <public-semweb-lifesci@w3.org>
Message-Id: <A102EB9A-EE2C-4871-BE80-64667FA0F035@duke.edu>
Pat et al.,

It sounds like people sometimes have an irresistible itch to say that  
"A is similar to B", but this statement as such has very little  
semantic content.

Perhaps it's not really intended as a statement that has a truth  
value, but rather as a record of somebody's feelings.

The semantic web can certainly serve as a repository for recording  
one's feelings, and this might even be useful. (If I were and  
astrophysicist, for example, I might be quite interested in Stephen  
Hawking's intuitions about some problem I was working on.)

So what would you say about an rdf:property called, say, "http://www.example.com/intuit#similarTo 
" that could be used simply to post a record that somebody intuited a  
"similarity" between two things?

It would have little utility for inferencing, unless one were to write  
a custom application (i.e. not OWL) to do so. But it might have  
utility as a semantic web "bookmark" for relationships that could be  
interesting candidates for future formalization.

John



On Mar 26, 2009, at 8:42 AM, Pat Hayes wrote:

>
> On Mar 26, 2009, at 8:28 AM, Michel_Dumontier wrote:
>
>> Pursuant to my email, and in light of several other comments, if our
>> goal is to now rectify what Uniprot:Protein _actually_ means in our
>> domain, and how it can be semantically mapped to other bio- 
>> ontologies,
>> then I might also suggest that instances of Uniprot:Protein are
>> aggregates of proteins (err... :ProteinAggregate anyone?), possibly
>> separated by both space and time, having a similar (base sequence +
>> mutations / ptms) composition, sharing certain characteristics (e.g.
>> functionality, domains) and observed to participate in biological
>> processes. Clearly not a type of protein of the single molecule form,
>> but again, certainly not a Record.
>
> Indeed. If I might make a suggestion, rather than talking about  
> 'aggregates' (which sounds disturbingly, er, philosophical), why not  
> just say that the entity being identified is a _substance_.  
> Substances are 'kinds of stuff' that include mixtures (eg concrete  
> is a kind of stuff comprising a mix of sand, crushed rock, cement  
> and water in several possible proportions) but also 'pure' stuffs  
> such as water. Note the distinction between a substance and a piece  
> of the substance (concrete, the building material vs,. this or that  
> lump of concrete) or a mereological sum (your 'aggregate', I think)  
> of such pieces (all the concrete in America). The utility of this is  
> that it eliminates the discussions about molecules, which I think is  
> getting in the way of clarity here.  Regarding sameAs, being the  
> same substance is a very strict kind of sameAs, of course, but it  
> really does only refer to substances, which is a step in the right  
> direction. Each protein is a substance. It might turn out that one  
> protein is a mixture of others, for example: this is fine, nothing  
> breaks, as long as nobody says the mixture is sameAs one of its  
> components. And now one can have notions such as 'purified form of'  
> or 'isotopic version of' between substances, which might help to  
> make all these distinctions that you chemists need to be concerned  
> with.
>
> Distinctions like object/substance/piece/mixture were worked out by  
> ontologists over 20 years ago, by the way. None of this is rocket  
> science.
>
> Pat
>
>
>>
>> -=Michel=-
>>
>>
>>
>>>
>>> If however, what we've been talking about is that identifiers like
>>> 	http://purl.uniprot.org/uniprot/Q16665
>>>
>>> are actually database records, and not molecular entities, then we  
>>> can
>>> settle this quickly:
>>>
>>> Uniprot RDF file: http://www.uniprot.org/uniprot/Q16665.rdf
>>> (is this what people were referring to as a Record???)
>>>
>>> Contains:
>>>
>>> <rdf:Description rdf:about="http://purl.uniprot.org/uniprot/Q16665">
>>> <rdf:type rdf:resource="http://purl.uniprot.org/core/Protein" />
>>>
>>>
>>> It's clear that the entity denoted by :Q16665 is rdf:type :Protein  
>>> and
>>> is the subject of statements that are biological in nature such as
>>> being
>>> located in sub-cellular compartments or being involved in  
>>> biochemical
>>> reactions. It is clearly not a Record. This is generally the case  
>>> for
>>> nearly all entries in biomolecular databases.
>>>
>>> Cheers,
>>>
>>> -=Michel=-
>>>
>>> Anxiously waiting see if this clears up things or generates
>> controversy
>>> .. it's hard to predict!
>>>
>>>
>>>
>>>> If nobody ever wants to use the same property to talk about the
>>>> database
>>>> record as was used to talk about the molecule, and nobody ever  
>>>> makes
>>> an
>>>> assertion that implies that the class of database records is
>> disjoint
>>>> from the class of molecules, then I don't see any harm in using the
>>>> same
>>>> URI to ambiguously denote both.   But if one is trying to design
>> data
>>>> to
>>>> be reusable by others in unforeseen ways, there clearly *is* a risk
>>>> that
>>>> someone will want to make such assertions in conjunction with the
>>> data,
>>>> and if that happens there is a clear harm.  This risk is easy to
>>> avoid
>>>> by using separate URIs.
>>>>
>>>> There *are* trade-offs.  Minting two URIs instead of one *does* add
>>>> some
>>>> complexity, though as I pointed out that additional complexity can
>> be
>>>> mitigated to the point that it is a *very* low cost.  Still,
>>> different
>>>> people will weigh these trade-offs differently, and what's best for
>>> one
>>>> situation may not be best for another, as I indicated in my  
>>>> original
>>>> post.
>>>>
>>>> Furthermore, even if one does use the same URI to ambiguously  
>>>> denote
>>>> both a database record and a molecule, that is not the end of the
>>> world
>>>> either.  It is possible (though more difficult) to later separate
>> out
>>>> and relate the different senses of an ambiguous URI, as I have
>>>> described:
>>>> http://dbooth.org/2007/splitting/
>>>> Ambiguity is inescapable, and ambiguity between a thing and a page
>>> that
>>>> describes that thing is not fundamentally different from other  
>>>> kinds
>>> of
>>>> ambiguity (except perhaps that we are aware of it in advance and it
>>> can
>>>> be easily avoided), as explained here:
>>>> http://dbooth.org/2007/splitting/#httpRange-14
>>>>
>>>> Finally, although it is flattering that you have named this
>>> suggestion
>>>> after me, I cannot take credit.  As I pointed out in my original
>>> post,
>>>> the suggestion to differentiate between a molecule and the database
>>>> record that describes that molecule originates with the  
>>>> Architecture
>>> of
>>>> the World Wide Web:
>>>> http://www.w3.org/TR/webarch/#URI-collision
>>>> and best practices for implementing this distinction are described
>> in
>>>> Cool URIs for the Semantic Web:
>>>> http://www.w3.org/TR/cooluris
>>>>
>>>> David Booth
>>>>
>>>>
>>>
>>
>>
>>
>>
>
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494  
> 3973
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>
Received on Thursday, 26 March 2009 16:32:32 UTC