Re: blog: semantic dissonance in uniprot

On 24 Mar 2009, at 10:32, Oliver Ruebenacker wrote:

>      Hello All,
>
> 2009/3/23 Michel_Dumontier <Michel_Dumontier@carleton.ca>:
>> I do not think this would be a wise "simplification".  This is only a
>> simplification from one perspective: because it avoids having to mint
>> and maintain pairs of URIs instead of a single URI.  But the  
>> downstream
>> cost is that it creates an ambiguity (or "URI collision")
>> http://www.w3.org/TR/webarch/#URI-collision
>> that may cause trouble and be difficult to untangle later as the  
>> data is
>> used in more and more ways.  For example, if any of the same  
>> predicates
>> need to be used on both the record and the molecular entity, they  
>> will
>> become hopelessly confused.  Also, if disjointness assertions are
>> included then this overloading may cause logical contraditions.
>
>   Can any one name a real world example of where confusion between an
> entity and its record was issue?

<joke>"Bobby, make sure you 'delete his record' if you know what I  
mean..."</joke>

People elide use/mention distinctions all the time without harm. It's  
very easy, when one first encounters the distinction, to get *really*  
excited about it and go around correcting it everywhere. Usually,  
it's pretty easy to detect and correct and you get to say thinks  
lige, "You *can't* use a string as the value of a dc:creator, because  
STRINGS AREN'T CREATORS...they are the *names* of creators!!! USE A  
URI!!!!"

It makes one feel *very* virtuous and logicy. Ah! To reflect on one's  
philosophical salad days brings such nostalgia!

Then you get into burning reams of people's lives with stuff like  
htttp-range-14. It gets ugly. Pretty soon the streets run with the  
electrons of records being mistaken for things EVERYWHERE. Next think  
you know, you are aligning your children with DOLCE..."Ok, Mary Sue  
is definitely a perdurant, but Mary *Jane*, well, she's clearly an  
*endurant*, poor thing."

Seriously, the obvious place where it's really worth distinguishing  
them is in entity reconciliation and data cleaning. If you have two  
records with different IDs (e.g., "Bijan the Great" and "Bijan the  
Greater") but which describe the same entity (i.e., Bijan the  
Greatest), then it's sometimes worth keeping the records numerically  
distinct (there are two erroneous records which record that Bijan is  
less than the Greatest) while the described entities unified.

This isn't a real world example (sorry!) but it 's realisitc, at  
least. :)

Cheers,
Bijan.

Received on Tuesday, 24 March 2009 10:54:26 UTC