Re: blog: semantic dissonance in uniprot

Oliver Ruebenacker <curoli@gmail.com> writes:
> 2009/3/23 Michel_Dumontier <Michel_Dumontier@carleton.ca>:
>> I do not think this would be a wise "simplification".  This is only a
>> simplification from one perspective: because it avoids having to mint
>> and maintain pairs of URIs instead of a single URI.  But the downstream
>> cost is that it creates an ambiguity (or "URI collision")
>> http://www.w3.org/TR/webarch/#URI-collision
>> that may cause trouble and be difficult to untangle later as the data is
>> used in more and more ways.  For example, if any of the same predicates
>> need to be used on both the record and the molecular entity, they will
>> become hopelessly confused.  Also, if disjointness assertions are
>> included then this overloading may cause logical contraditions.
>
>   Can any one name a real world example of where confusion between an
> entity and its record was issue?


Yes, sure. All proteins have a Uniprot ID (conflating protein and
uniprot records). Then we integrate this with drugbank; this represents
many things including proteins which are not in Uniprot, or represents
several proteins where Uniprot has one. Consider insulin for instance.
We now have a problem because not all proteins have a Uniprot ID. 

The flip side is that if you always say

Protein Record --> contains knowledge about --> protein

it's much more complicated. You are making your data model more
difficult to work with all of the time, to cope with edge cases which
occur only some of the time. 

There's no way around this; either way it's a compromise and what is
good in one context may not be good in another. 

Phil

Received on Tuesday, 24 March 2009 12:11:40 UTC