Re: blog: semantic dissonance in uniprot from David Booth on 2009-03-25 (public-semweb-lifesci@w3.org from March 2009)

From: David Booth <david@dbooth.org>
Date: Wed, 25 Mar 2009 19:59:40 -0400
To: W3C HCLSIG hcls <public-semweb-lifesci@w3.org>
Message-Id: <1238025581.27539.3081.camel@nc6000.w3.org>
On Wed, 2009-03-25 at 14:13 +0000, Bijan Parsia wrote:
> On 25 Mar 2009, at 10:41, Phillip Lord wrote:
> 
> > "Michel_Dumontier" <Michel_Dumontier@carleton.ca> writes:
> >> And I'm trying to explain that there is no pragmatic reason to make
> >> explicit the distinction between a biomolecule (and what we know  
> >> about
> >> it) and a database record (and what we know about the biomolecule)
> >> unless they are actually different.  It just complicates things in a
> >> wholly unnecessary way.

There may not be a pragmatic reason for *your* applications, but there
may well be for others, as explained below.

> >
> > I've given a clear example. Where two databases exist, with two  
> > records,
> > which appear to be referring to the same (class of) molecules.
> [snip]
> 
> This is the key example.
> 
> But there's the other key example, where one record exists which  
> appear to be referring to multiple entities (either by ambiguity or  
> by composition). This is a generalization of your point about ill  
> definedness of the very idea of a gene.
> 
> To paraphase you (I think), introducing a resource in the latter case  
> takes you from 1 mapping problem to 2 mapping problems.
> 
> This is why the the Boothian line is quite naive. 

Sorry, but this is not a case of naivete.  

If nobody ever wants to use the same property to talk about the database
record as was used to talk about the molecule, and nobody ever makes an
assertion that implies that the class of database records is disjoint
from the class of molecules, then I don't see any harm in using the same
URI to ambiguously denote both.   But if one is trying to design data to
be reusable by others in unforeseen ways, there clearly *is* a risk that
someone will want to make such assertions in conjunction with the data,
and if that happens there is a clear harm.  This risk is easy to avoid
by using separate URIs.  

There *are* trade-offs.  Minting two URIs instead of one *does* add some
complexity, though as I pointed out that additional complexity can be
mitigated to the point that it is a *very* low cost.  Still, different
people will weigh these trade-offs differently, and what's best for one
situation may not be best for another, as I indicated in my original
post.

Furthermore, even if one does use the same URI to ambiguously denote
both a database record and a molecule, that is not the end of the world
either.  It is possible (though more difficult) to later separate out
and relate the different senses of an ambiguous URI, as I have
described:
http://dbooth.org/2007/splitting/ 
Ambiguity is inescapable, and ambiguity between a thing and a page that
describes that thing is not fundamentally different from other kinds of
ambiguity (except perhaps that we are aware of it in advance and it can
be easily avoided), as explained here:
http://dbooth.org/2007/splitting/#httpRange-14 

Finally, although it is flattering that you have named this suggestion
after me, I cannot take credit.  As I pointed out in my original post,
the suggestion to differentiate between a molecule and the database
record that describes that molecule originates with the Architecture of
the World Wide Web:
http://www.w3.org/TR/webarch/#URI-collision 
and best practices for implementing this distinction are described in
Cool URIs for the Semantic Web:
http://www.w3.org/TR/cooluris

David Booth
Received on Thursday, 26 March 2009 00:00:18 UTC