Re: blog: semantic dissonance in uniprot from Kingsley Idehen on 2009-03-24 (public-semweb-lifesci@w3.org from March 2009)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Tue, 24 Mar 2009 01:06:32 -0400
To: Michel_Dumontier <Michel_Dumontier@carleton.ca>
CC: David Booth <david@dbooth.org>, W3C HCLSIG hcls <public-semweb-lifesci@w3.org>
Message-ID: <49C86A58.7090106@openlinksw.com>
Michel_Dumontier wrote:
> David,
>   There's nothing like resurrecting this discussion one more time ;-)
>
> For all representations one should make simplifying assumptions in order to increase the usability of the system.
>
> In the life sciences, scientists don't care about database records - they care about the molecules and the biological processes for which facts have been collected about. It is an artifact of database KR that we have such records in the first place. You probably won't see "Record" in a bio-ontology.
>
> IMHO, 303 redirects simply complicate matters and is not useful.
>
> We track provenance with namespace and/or graphs. What could be simpler?
>
> -=Michel=-
>
>
> -----Original Message-----
> From: David Booth [mailto:david@dbooth.org] 
> Sent: Monday, March 23, 2009 10:27 PM
> To: Michel_Dumontier
> Cc: W3C HCLSIG hcls
> Subject: RE: blog: semantic dissonance in uniprot
>
> Eric,
>
> On Sat, 2009-03-21 at 13:49 -0400, Michel_Dumontier wrote:
>   
>> Eric and friends,
>>
>>  I’m very sympathetic to the simplifying assumption of not
>> distinguishing between a record and the molecular entity it
>> represents, but . . . .
>>     
>
> I do not think this would be a wise "simplification".  This is only a
> simplification from one perspective: because it avoids having to mint
> and maintain pairs of URIs instead of a single URI.  But the downstream
> cost is that it creates an ambiguity (or "URI collision")
> http://www.w3.org/TR/webarch/#URI-collision
> that may cause trouble and be difficult to untangle later as the data is
> used in more and more ways.  For example, if any of the same predicates
> need to be used on both the record and the molecular entity, they will
> become hopelessly confused.  Also, if disjointness assertions are
> included then this overloading may cause logical contraditions.
>
> Cool URIs for the Semantic Web
> http://www.w3.org/TR/cooluris 
> describes best practices for minting URIs using 303 redirects to enable
> the record to be obtained (indirectly) by following the URI for a
> molecular entity.  If minting a separate URI for the molecular entity
> seems onerous, it is trivial to use a 303-redirect service such as
> http://thing-described-by.org/ 
> to do the job for you.  And if you want to set up your own 303-redirect
> service, that site will even show you the exact files that are used to
> implement it:
> http://thing-described-by.org/#What_This_Site_Does_ 
>
> Provenance (who said what) is extremely important in scientific anaylsis
> -- explicitly tracking the evidence leading to scientific assertions.
> It is easy for me to envision applications that will both use assertions
> about a molecular entity *and* assertions about the records that
> describe those molecular entities.
>
> If you are just minting disposable URIs that aren't intended to be very
> reusable anyway, then this ambiguity is not a problem, and it may be the
> quickest solution to your problem.  But if you want your URIs to be long
> lived and used by others for other applications, I think it would be a
> mistake.
>
> David Booth
>
>
>
>
>   
Michel,

303 redirection serves a single purpose: enforcement of the Identity 
principle for discrete data objects. If a datum lacks identity it cannot 
in away be resourceful.

The identity principle also implies that "Identity" stands alone from 
all else, you cannot intermngle with "representation", for instance.

30X redirection is simply how you can implement Identity using HTTP 
based Identifiers, meaning: a URI for a real-world data object (aka. 
resource) and a representation of its description are distinct. Thus, to 
honor the aforementioned principles, an HTTP Server receiving an HTTP 
GET from a user agent that targets data object via its URI, must 
re-route the request to an information resource URL that delivers a 
description of the data object in question using a representation format 
negotiated by the client and/or server.

If you are going to honor the Identity principle on the Web, in an 
unobtrusive manner (i.e., leverage ubiquity of HTTP) there is no way 
around the above.

The whole essence of the Linked Data Web comes down to distillation of 
Data Objects from the host Information Resources (documents) i.e, making 
the Data Objects referencable and de-referencable via URIs, in the same 
manner exhibited by their host / container documents since the beginning 
of hypertext. In short, think of this as Hyperdata linking added to the 
broad concept of hyperlinking.

Scientist are always preoccupied with, and interested in, database 
records because science lives and dies by the following processes:

1. Hypothesis
2. Observation
3. Conclusion

The steps above are about units of observation ("data"), contextual 
representation ("information"), and conclusions ("knowledge").

In my experience, scientists are completely preoccupied with Data :-)

-- 


Regards,

Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com
Received on Tuesday, 24 March 2009 05:07:17 UTC