RE: My task from last week: Semantic free identifiers

It's exactly the same reason why we have tables with incremental primary keys or have social security numbers for people and ISBN's for books.  The identifier is meant to identify one thing, and should not clash with other things having similar or exact names. What that thing is, is up to you. But you don't need a fancy algorithm to generate them so that you ensure uniqueness.  In creating RDF data (for Bio2RDF), we're often put in the position of having to create unique identifiers (so as to avoid unreliable blank nodes), and we sometimes have no other alternative but to hash 3-8 values to get that (and to ensure we'll generate the same identifier in the future).  Having a guaranteed primary key is definitely good for change management.

However, if you're quite sure that your system will never generate the same identifier (EVER EVER EVER) for another entity, then go ahead and use labels in your URIs.  But if you expect some churn in the meantime (as will happen with domain ontologies - see 'Protein' for BioPAX as an example), then you may want to investigate a more principled approach. There are many cases in SIO where I changed the label - to be more accurate wrt to the definition or just to conform to a new label syntax. Had I linked the label to the identifier, this would cause some cognitive dissonance, and be a pain for users to update.

m.

From: public-semweb-lifesci-request@w3.org [mailto:public-semweb-lifesci-request@w3.org] On Behalf Of Sivaram Arabandi, MD
Sent: Monday, June 20, 2011 3:56 PM
To: Chime Ogbuji
Cc: Andrea Splendiani; Vagnoni,Matthew M; James Malone; HCLS
Subject: Re: My task from last week: Semantic free identifiers

I couldn't "agree" more with Andrea and Chime on this one. And would like to see some good reason(s) for us to continue to be burdened by them.
The standard answer - 'tooling can help in managing the readability aspects' has been heard several times, and yet everyone seems to pass around 'raw RDF or SPARQL snippets with readable URIs' - for sure these will be absolutely unreadable if we were to use totally opaque identifiers.

I recently had a discussion on this topic with Michel (during Semtech) and this exact line of thinking that Mark alluded to in his email came up:
          "though I guess, for them, "partOf" *is* opaque... so...??  Perhaps that argument is somewhat spurious??"

--Sivaram
____________________________
Sivaram Arabandi, MD, MS
Ph:  216.374.2883

http://ontolog.cim3.net/cgi-bin/wiki.pl?SivaramArabandi
http://www..linkedin.com/pub/sivaram-arabandi/1/9ab/92a<http://www.linkedin.com/pub/sivaram-arabandi/1/9ab/92a>



On Jun 20, 2011, at 3:34 PM, Chime Ogbuji wrote:


On Monday, June 20, 2011 at 3:08 PM, Andrea Splendiani wrote:

Hi,
sorry to jump on this thread like this...

To be honest, I'm kind of concerned by the insistence on semantic-opaque
identifiers.
I am as well and I have been for some time.
I understand the reason for them,

Actually, I would be interested in hearing the reason for them enumerated, because I have had a hard time imagining what could possibly offset the (significant) impact on readability that it has on biomedical ontologies.  The barrier is already high for non-logicians and non-semantic web aficionados to use biomedical ontologies.  Why set it any higher?

-- Chime


________________________________

No virus found in this message.
Checked by AVG - www.avg.com<http://www.avg.com>
Version: 10.0.1382 / Virus Database: 1513/3715 - Release Date: 06/20/11

Received on Monday, 20 June 2011 22:38:12 UTC