Re: blog: semantic dissonance in uniprot from Kei Cheung on 2009-03-26 (public-semweb-lifesci@w3.org from March 2009)

From: Kei Cheung <kei.cheung@yale.edu>
Date: Thu, 26 Mar 2009 10:07:05 -0400
To: Michel_Dumontier <Michel_Dumontier@carleton.ca>
Cc: W3C HCLSIG hcls <public-semweb-lifesci@w3.org>, Matthias Samwald <samwald@gmx.at>
Message-id: <49CB8C09.3030403@yale.edu>
In addition to Uniprot, in light of Matthias' earlier email, what about 
http://en.wikipedia.org/wiki/Protein, http://dbpedia.org/page/Protein, 
and the protein related ontologies listed in OBO 
(http://www.obofoundry.org/)?

-Kei

Michel_Dumontier wrote:
> Pursuant to my email, and in light of several other comments, if our
> goal is to now rectify what Uniprot:Protein _actually_ means in our
> domain, and how it can be semantically mapped to other bio-ontologies,
> then I might also suggest that instances of Uniprot:Protein are
> aggregates of proteins (err... :ProteinAggregate anyone?), possibly
> separated by both space and time, having a similar (base sequence +
> mutations / ptms) composition, sharing certain characteristics (e.g.
> functionality, domains) and observed to participate in biological
> processes. Clearly not a type of protein of the single molecule form,
> but again, certainly not a Record.
>
> -=Michel=-
>
>
>
>   
>>  If however, what we've been talking about is that identifiers like
>>  	http://purl.uniprot.org/uniprot/Q16665
>>
>> are actually database records, and not molecular entities, then we can
>> settle this quickly:
>>
>> Uniprot RDF file: http://www.uniprot.org/uniprot/Q16665.rdf
>> (is this what people were referring to as a Record???)
>>
>> Contains:
>>
>> <rdf:Description rdf:about="http://purl.uniprot.org/uniprot/Q16665">
>>  <rdf:type rdf:resource="http://purl.uniprot.org/core/Protein" />
>>
>>
>> It's clear that the entity denoted by :Q16665 is rdf:type :Protein and
>> is the subject of statements that are biological in nature such as
>> being
>> located in sub-cellular compartments or being involved in biochemical
>> reactions. It is clearly not a Record. This is generally the case for
>> nearly all entries in biomolecular databases.
>>
>> Cheers,
>>
>> -=Michel=-
>>
>> Anxiously waiting see if this clears up things or generates
>>     
> controversy
>   
>> .. it's hard to predict!
>>
>>
>>
>>     
>>> If nobody ever wants to use the same property to talk about the
>>> database
>>> record as was used to talk about the molecule, and nobody ever makes
>>>       
>> an
>>     
>>> assertion that implies that the class of database records is
>>>       
> disjoint
>   
>>> from the class of molecules, then I don't see any harm in using the
>>> same
>>> URI to ambiguously denote both.   But if one is trying to design
>>>       
> data
>   
>>> to
>>> be reusable by others in unforeseen ways, there clearly *is* a risk
>>> that
>>> someone will want to make such assertions in conjunction with the
>>>       
>> data,
>>     
>>> and if that happens there is a clear harm.  This risk is easy to
>>>       
>> avoid
>>     
>>> by using separate URIs.
>>>
>>> There *are* trade-offs.  Minting two URIs instead of one *does* add
>>> some
>>> complexity, though as I pointed out that additional complexity can
>>>       
> be
>   
>>> mitigated to the point that it is a *very* low cost.  Still,
>>>       
>> different
>>     
>>> people will weigh these trade-offs differently, and what's best for
>>>       
>> one
>>     
>>> situation may not be best for another, as I indicated in my original
>>> post.
>>>
>>> Furthermore, even if one does use the same URI to ambiguously denote
>>> both a database record and a molecule, that is not the end of the
>>>       
>> world
>>     
>>> either.  It is possible (though more difficult) to later separate
>>>       
> out
>   
>>> and relate the different senses of an ambiguous URI, as I have
>>> described:
>>> http://dbooth.org/2007/splitting/
>>> Ambiguity is inescapable, and ambiguity between a thing and a page
>>>       
>> that
>>     
>>> describes that thing is not fundamentally different from other kinds
>>>       
>> of
>>     
>>> ambiguity (except perhaps that we are aware of it in advance and it
>>>       
>> can
>>     
>>> be easily avoided), as explained here:
>>> http://dbooth.org/2007/splitting/#httpRange-14
>>>
>>> Finally, although it is flattering that you have named this
>>>       
>> suggestion
>>     
>>> after me, I cannot take credit.  As I pointed out in my original
>>>       
>> post,
>>     
>>> the suggestion to differentiate between a molecule and the database
>>> record that describes that molecule originates with the Architecture
>>>       
>> of
>>     
>>> the World Wide Web:
>>> http://www.w3.org/TR/webarch/#URI-collision
>>> and best practices for implementing this distinction are described
>>>       
> in
>   
>>> Cool URIs for the Semantic Web:
>>> http://www.w3.org/TR/cooluris
>>>
>>> David Booth
>>>
>>>
>>>       
>
>
>
Received on Thursday, 26 March 2009 14:07:53 UTC