RE: My task from last week: Semantic free identifiers from Michel_Dumontier on 2011-06-21 (public-semweb-lifesci@w3.org from June 2011)

From: Michel_Dumontier <Michel_Dumontier@carleton.ca>
Date: Mon, 20 Jun 2011 21:38:11 -0400
To: "Sivaram Arabandi, MD" <sivaram.arabandi@gmail.com>, Andrea Splendiani <andrea.splendiani@bbsrc.ac.uk>
CC: Michel_Dumontier <Michel_Dumontier@carleton.ca>, Chime Ogbuji <chimezie@gmail.com>, "andrea splendiani (RRes-Roth)" <andrea.splendiani@rothamsted.ac.uk>, "Vagnoni,Matthew M" <MMVagnoni@mdanderson.org>, James Malone <malone@ebi.ac.uk>, HCLS <public-semweb-lifesci@w3.org>
Message-ID: <E1784B0107E5634C8997868083EDE78061257059A4@CCSMBX10.CUNET.CARLETON.CA>

From: Sivaram Arabandi, MD [mailto:sivaram.arabandi@gmail.com]
Sent: Monday, June 20, 2011 8:42 PM
To: Andrea Splendiani
Cc: Michel_Dumontier; Chime Ogbuji; andrea splendiani (RRes-Roth); Vagnoni,Matthew M; James Malone; HCLS
Subject: Re: My task from last week: Semantic free identifiers

On Jun 20, 2011, at 7:47 PM, Andrea Splendiani wrote:

Let's be precise.
I think everybody here would agree that to have opaque unique identifier is a sensible policy for individuals more often than not.
To keep the analogy with relational databases, the issue is not whether ID should be opaque or not, but whether Table and Column names should be opaque or not.

I think we're in complete agreement here - and that's why we specify the human readable label using rdfs:label (and assign the language tag, if desired).

m.

+1
very well put.

Thinking about ontology terms, there are good reasons for which these should be "codes", rather than definitions in a word (that is, to at least avoid the temptations). Whether they 'should' be codes, is in the tradeoff area. I guess it depends on domains. It makes sense for OBO, less for DBPedia.
What I was originally finding a bit too much is that 'everything' should necessarily have an opaque id. We can live with rdf:type, perhaps obo:partOf and so on...

ciao,
Andrea

Il giorno 20/giu/2011, alle ore 23.38, Michel_Dumontier ha scritto:

It's exactly the same reason why we have tables with incremental primary keys or have social security numbers for people and ISBN's for books.  The identifier is meant to identify one thing, and should not clash with other things having similar or exact names. What that thing is, is up to you. But you don't need a fancy algorithm to generate them so that you ensure uniqueness.  In creating RDF data (for Bio2RDF), we're often put in the position of having to create unique identifiers (so as to avoid unreliable blank nodes), and we sometimes have no other alternative but to hash 3-8 values to get that (and to ensure we'll generate the same identifier in the future).  Having a guaranteed primary key is definitely good for change management.

However, if you're quite sure that your system will never generate the same identifier (EVER EVER EVER) for another entity, then go ahead and use labels in your URIs.  But if you expect some churn in the meantime (as will happen with domain ontologies - see 'Protein' for BioPAX as an example), then you may want to investigate a more principled approach. There are many cases in SIO where I changed the label - to be more accurate wrt to the definition or just to conform to a new label syntax. Had I linked the label to the identifier, this would cause some cognitive dissonance, and be a pain for users to update.

m.

From: public-semweb-lifesci-request@w3.org<mailto:public-semweb-lifesci-request@w3.org> [mailto:public-semweb-lifesci-request@w3.org] On Behalf Of Sivaram Arabandi, MD
Sent: Monday, June 20, 2011 3:56 PM
To: Chime Ogbuji
Cc: Andrea Splendiani; Vagnoni,Matthew M; James Malone; HCLS
Subject: Re: My task from last week: Semantic free identifiers

I couldn't "agree" more with Andrea and Chime on this one. And would like to see some good reason(s) for us to continue to be burdened by them.
The standard answer - 'tooling can help in managing the readability aspects' has been heard several times, and yet everyone seems to pass around 'raw RDF or SPARQL snippets with readable URIs' - for sure these will be absolutely unreadable if we were to use totally opaque identifiers.

I recently had a discussion on this topic with Michel (during Semtech) and this exact line of thinking that Mark alluded to in his email came up:
          "though I guess, for them, "partOf" *is* opaque... so...??  Perhaps that argument is somewhat spurious??"

--Sivaram
____________________________
Sivaram Arabandi, MD, MS
Ph:  216.374.2883

http://ontolog.cim3.net/cgi-bin/wiki.pl?SivaramArabandi
http://www..linkedin.com/pub/sivaram-arabandi/1/9ab/92a<http://www.linkedin.com/pub/sivaram-arabandi/1/9ab/92a>

On Jun 20, 2011, at 3:34 PM, Chime Ogbuji wrote:

On Monday, June 20, 2011 at 3:08 PM, Andrea Splendiani wrote:

Hi,
sorry to jump on this thread like this...

To be honest, I'm kind of concerned by the insistence on semantic-opaque
identifiers.
I am as well and I have been for some time.
I understand the reason for them,

Actually, I would be interested in hearing the reason for them enumerated, because I have had a hard time imagining what could possibly offset the (significant) impact on readability that it has on biomedical ontologies.  The barrier is already high for non-logicians and non-semantic web aficionados to use biomedical ontologies.  Why set it any higher?

-- Chime

________________________________

No virus found in this message.
Checked by AVG - www.avg.com<http://www.avg.com/>
Version: 10.0.1382 / Virus Database: 1513/3715 - Release Date: 06/20/11

Andrea Splendiani
Senior Bioinformatics Scientist
Centre for Mathematical and Computational Biology
+44(0)1582 763133 ext 2004
andrea.splendiani@bbsrc.ac.uk<mailto:andrea.splendiani@bbsrc.ac.uk>

________________________________

No virus found in this message.
Checked by AVG - www.avg.com<http://www.avg.com>
Version: 10.0.1382 / Virus Database: 1513/3716 - Release Date: 06/20/11

Received on Tuesday, 21 June 2011 01:38:15 UTC