Re: blog: semantic dissonance in uniprot

On 25 Mar 2009, at 10:41, Phillip Lord wrote:

> "Michel_Dumontier" <Michel_Dumontier@carleton.ca> writes:
>> And I'm trying to explain that there is no pragmatic reason to make
>> explicit the distinction between a biomolecule (and what we know  
>> about
>> it) and a database record (and what we know about the biomolecule)
>> unless they are actually different.  It just complicates things in a
>> wholly unnecessary way.
>
>
> I've given a clear example. Where two databases exist, with two  
> records,
> which appear to be referring to the same (class of) molecules.
[snip]

This is the key example.

But there's the other key example, where one record exists which  
appear to be referring to multiple entities (either by ambiguity or  
by composition). This is a generalization of your point about ill  
definedness of the very idea of a gene.

To paraphase you (I think), introducing a resource in the latter case  
takes you from 1 mapping problem to 2 mapping problems.

This is why the the Boothian line is quite naive. If it's just the  
case that you have 1 (or more) records and a clear relationship  
better the record(s) and the object described by the record, then it  
may (or may not!, by often will) make sense to distinguish them and  
name each, esp. for the purpose of entity reconciliation, record  
reconciliation, entity exploration, etc.

However, if you are forced to do so without a clear purpose, then you  
just add more noise to the overall system. You are likely to make  
brute errors and you are likely to make choices that conflict with  
those motivated by different applications.

This is why clear empirical data is important. It's perfectly  
possible to do harm (in aggregate) by following a rule intended to  
produce good.

Cheers,
Bijan.

Received on Wednesday, 25 March 2009 14:16:20 UTC