Re: blog: semantic dissonance in uniprot from eric neumann on 2009-03-26 (public-semweb-lifesci@w3.org from March 2009)

From: eric neumann <ekneumann@gmail.com>
Date: Thu, 26 Mar 2009 12:35:23 -0400
To: "John F. Madden" <john.madden@duke.edu>
Cc: Pat Hayes <phayes@ihmc.us>, Michel_Dumontier <Michel_Dumontier@carleton.ca>, W3C HCLSIG hcls <public-semweb-lifesci@w3.org>
Message-ID: <92e86c7d0903260935l1c6aff76nbbb214d14d349f26@mail.gmail.com>
+1

On Thu, Mar 26, 2009 at 12:31 PM, John F. Madden <john.madden@duke.edu>wrote:

> Pat et al.,
>
> It sounds like people sometimes have an irresistible itch to say that "A is
> similar to B", but this statement as such has very little semantic content.
>
> Perhaps it's not really intended as a statement that has a truth value, but
> rather as a record of somebody's feelings.
>
> The semantic web can certainly serve as a repository for recording one's
> feelings, and this might even be useful. (If I were and astrophysicist, for
> example, I might be quite interested in Stephen Hawking's intuitions about
> some problem I was working on.)
>
> So what would you say about an rdf:property called, say, "
> http://www.example.com/intuit#similarTo" that could be used simply to post
> a record that somebody intuited a "similarity" between two things?
>
> It would have little utility for inferencing, unless one were to write a
> custom application (i.e. not OWL) to do so. But it might have utility as a
> semantic web "bookmark" for relationships that could be interesting
> candidates for future formalization.
>
> John
>
>
>
>
> On Mar 26, 2009, at 8:42 AM, Pat Hayes wrote:
>
>
>> On Mar 26, 2009, at 8:28 AM, Michel_Dumontier wrote:
>>
>>  Pursuant to my email, and in light of several other comments, if our
>>> goal is to now rectify what Uniprot:Protein _actually_ means in our
>>> domain, and how it can be semantically mapped to other bio-ontologies,
>>> then I might also suggest that instances of Uniprot:Protein are
>>> aggregates of proteins (err... :ProteinAggregate anyone?), possibly
>>> separated by both space and time, having a similar (base sequence +
>>> mutations / ptms) composition, sharing certain characteristics (e.g.
>>> functionality, domains) and observed to participate in biological
>>> processes. Clearly not a type of protein of the single molecule form,
>>> but again, certainly not a Record.
>>>
>>
>> Indeed. If I might make a suggestion, rather than talking about
>> 'aggregates' (which sounds disturbingly, er, philosophical), why not just
>> say that the entity being identified is a _substance_. Substances are 'kinds
>> of stuff' that include mixtures (eg concrete is a kind of stuff comprising a
>> mix of sand, crushed rock, cement and water in several possible proportions)
>> but also 'pure' stuffs such as water. Note the distinction between a
>> substance and a piece of the substance (concrete, the building material vs,.
>> this or that lump of concrete) or a mereological sum (your 'aggregate', I
>> think) of such pieces (all the concrete in America). The utility of this is
>> that it eliminates the discussions about molecules, which I think is getting
>> in the way of clarity here.  Regarding sameAs, being the same substance is a
>> very strict kind of sameAs, of course, but it really does only refer to
>> substances, which is a step in the right direction. Each protein is a
>> substance. It might turn out that one protein is a mixture of others, for
>> example: this is fine, nothing breaks, as long as nobody says the mixture is
>> sameAs one of its components. And now one can have notions such as 'purified
>> form of' or 'isotopic version of' between substances, which might help to
>> make all these distinctions that you chemists need to be concerned with.
>>
>> Distinctions like object/substance/piece/mixture were worked out by
>> ontologists over 20 years ago, by the way. None of this is rocket science.
>>
>> Pat
>>
>>
>>
>>> -=Michel=-
>>>
>>>
>>>
>>>
>>>> If however, what we've been talking about is that identifiers like
>>>>        http://purl.uniprot.org/uniprot/Q16665
>>>>
>>>> are actually database records, and not molecular entities, then we can
>>>> settle this quickly:
>>>>
>>>> Uniprot RDF file: http://www.uniprot.org/uniprot/Q16665.rdf
>>>> (is this what people were referring to as a Record???)
>>>>
>>>> Contains:
>>>>
>>>> <rdf:Description rdf:about="http://purl.uniprot.org/uniprot/Q16665">
>>>> <rdf:type rdf:resource="http://purl.uniprot.org/core/Protein" />
>>>>
>>>>
>>>> It's clear that the entity denoted by :Q16665 is rdf:type :Protein and
>>>> is the subject of statements that are biological in nature such as
>>>> being
>>>> located in sub-cellular compartments or being involved in biochemical
>>>> reactions. It is clearly not a Record. This is generally the case for
>>>> nearly all entries in biomolecular databases.
>>>>
>>>> Cheers,
>>>>
>>>> -=Michel=-
>>>>
>>>> Anxiously waiting see if this clears up things or generates
>>>>
>>> controversy
>>>
>>>> .. it's hard to predict!
>>>>
>>>>
>>>>
>>>>  If nobody ever wants to use the same property to talk about the
>>>>> database
>>>>> record as was used to talk about the molecule, and nobody ever makes
>>>>>
>>>> an
>>>>
>>>>> assertion that implies that the class of database records is
>>>>>
>>>> disjoint
>>>
>>>> from the class of molecules, then I don't see any harm in using the
>>>>> same
>>>>> URI to ambiguously denote both.   But if one is trying to design
>>>>>
>>>> data
>>>
>>>> to
>>>>> be reusable by others in unforeseen ways, there clearly *is* a risk
>>>>> that
>>>>> someone will want to make such assertions in conjunction with the
>>>>>
>>>> data,
>>>>
>>>>> and if that happens there is a clear harm.  This risk is easy to
>>>>>
>>>> avoid
>>>>
>>>>> by using separate URIs.
>>>>>
>>>>> There *are* trade-offs.  Minting two URIs instead of one *does* add
>>>>> some
>>>>> complexity, though as I pointed out that additional complexity can
>>>>>
>>>> be
>>>
>>>> mitigated to the point that it is a *very* low cost.  Still,
>>>>>
>>>> different
>>>>
>>>>> people will weigh these trade-offs differently, and what's best for
>>>>>
>>>> one
>>>>
>>>>> situation may not be best for another, as I indicated in my original
>>>>> post.
>>>>>
>>>>> Furthermore, even if one does use the same URI to ambiguously denote
>>>>> both a database record and a molecule, that is not the end of the
>>>>>
>>>> world
>>>>
>>>>> either.  It is possible (though more difficult) to later separate
>>>>>
>>>> out
>>>
>>>> and relate the different senses of an ambiguous URI, as I have
>>>>> described:
>>>>> http://dbooth.org/2007/splitting/
>>>>> Ambiguity is inescapable, and ambiguity between a thing and a page
>>>>>
>>>> that
>>>>
>>>>> describes that thing is not fundamentally different from other kinds
>>>>>
>>>> of
>>>>
>>>>> ambiguity (except perhaps that we are aware of it in advance and it
>>>>>
>>>> can
>>>>
>>>>> be easily avoided), as explained here:
>>>>> http://dbooth.org/2007/splitting/#httpRange-14
>>>>>
>>>>> Finally, although it is flattering that you have named this
>>>>>
>>>> suggestion
>>>>
>>>>> after me, I cannot take credit.  As I pointed out in my original
>>>>>
>>>> post,
>>>>
>>>>> the suggestion to differentiate between a molecule and the database
>>>>> record that describes that molecule originates with the Architecture
>>>>>
>>>> of
>>>>
>>>>> the World Wide Web:
>>>>> http://www.w3.org/TR/webarch/#URI-collision
>>>>> and best practices for implementing this distinction are described
>>>>>
>>>> in
>>>
>>>> Cool URIs for the Semantic Web:
>>>>> http://www.w3.org/TR/cooluris
>>>>>
>>>>> David Booth
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>>
>> ------------------------------------------------------------
>> IHMC                                     (850)434 8903 or (650)494 3973
>> 40 South Alcaniz St.           (850)202 4416   office
>> Pensacola                            (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>
>>
>>
>>
>>
>>
>>
>
>
Received on Thursday, 26 March 2009 16:36:05 UTC