W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > June 2011

RE: My task from last week: Semantic free identifiers

From: Michel_Dumontier <Michel_Dumontier@carleton.ca>
Date: Mon, 20 Jun 2011 20:08:01 -0400
To: Andrea Splendiani <andrea.splendiani@bbsrc.ac.uk>, Michel_Dumontier <Michel_Dumontier@carleton.ca>
CC: "Sivaram Arabandi, MD" <sivaram.arabandi@gmail.com>, Chime Ogbuji <chimezie@gmail.com>, "andrea splendiani (RRes-Roth)" <andrea.splendiani@rothamsted.ac.uk>, "Vagnoni,Matthew M" <MMVagnoni@mdanderson.org>, James Malone <malone@ebi.ac.uk>, HCLS <public-semweb-lifesci@w3.org>
Message-ID: <E1784B0107E5634C8997868083EDE78061257059A1@CCSMBX10.CUNET.CARLETON.CA>
Re:dbpedia - Wikipedia's primary scheme involves using the label itself in the URI - of course since a term can have multiple meanings, it redirects you to a disambiguation page and provides links to each page providing a different meaning. The URI scheme works, but only because they control the process.  But if *you* want to know which page to link to, you have to do the disambiguation - you can't assume you've got the right URI.

In either case (alphanumeric or terminological), one has to do a lookup, to make sure you have the right one.



From: public-semweb-lifesci-request@w3.org [mailto:public-semweb-lifesci-request@w3.org] On Behalf Of Andrea Splendiani
Sent: Monday, June 20, 2011 7:48 PM
To: Michel_Dumontier
Cc: Sivaram Arabandi, MD; Chime Ogbuji; andrea splendiani (RRes-Roth); Vagnoni,Matthew M; James Malone; HCLS
Subject: Re: My task from last week: Semantic free identifiers

Let's be precise.
I think everybody here would agree that to have opaque unique identifier is a sensible policy for individuals more often than not.
To keep the analogy with relational databases, the issue is not whether ID should be opaque or not, but whether Table and Column names should be opaque or not.
Thinking about ontology terms, there are good reasons for which these should be "codes", rather than definitions in a word (that is, to at least avoid the temptations). Whether they 'should' be codes, is in the tradeoff area. I guess it depends on domains. It makes sense for OBO, less for DBPedia.
What I was originally finding a bit too much is that 'everything' should necessarily have an opaque id. We can live with rdf:type, perhaps obo:partOf and so on...


Il giorno 20/giu/2011, alle ore 23.38, Michel_Dumontier ha scritto:

It's exactly the same reason why we have tables with incremental primary keys or have social security numbers for people and ISBN's for books.  The identifier is meant to identify one thing, and should not clash with other things having similar or exact names. What that thing is, is up to you. But you don't need a fancy algorithm to generate them so that you ensure uniqueness.  In creating RDF data (for Bio2RDF), we're often put in the position of having to create unique identifiers (so as to avoid unreliable blank nodes), and we sometimes have no other alternative but to hash 3-8 values to get that (and to ensure we'll generate the same identifier in the future).  Having a guaranteed primary key is definitely good for change management.

However, if you're quite sure that your system will never generate the same identifier (EVER EVER EVER) for another entity, then go ahead and use labels in your URIs.  But if you expect some churn in the meantime (as will happen with domain ontologies - see 'Protein' for BioPAX as an example), then you may want to investigate a more principled approach. There are many cases in SIO where I changed the label - to be more accurate wrt to the definition or just to conform to a new label syntax. Had I linked the label to the identifier, this would cause some cognitive dissonance, and be a pain for users to update.


From: public-semweb-lifesci-request@w3.org<mailto:public-semweb-lifesci-request@w3.org> [mailto:public-semweb-lifesci-request@w3.org] On Behalf Of Sivaram Arabandi, MD
Sent: Monday, June 20, 2011 3:56 PM
To: Chime Ogbuji
Cc: Andrea Splendiani; Vagnoni,Matthew M; James Malone; HCLS
Subject: Re: My task from last week: Semantic free identifiers

I couldn't "agree" more with Andrea and Chime on this one. And would like to see some good reason(s) for us to continue to be burdened by them.
The standard answer - 'tooling can help in managing the readability aspects' has been heard several times, and yet everyone seems to pass around 'raw RDF or SPARQL snippets with readable URIs' - for sure these will be absolutely unreadable if we were to use totally opaque identifiers.

I recently had a discussion on this topic with Michel (during Semtech) and this exact line of thinking that Mark alluded to in his email came up:
          "though I guess, for them, "partOf" *is* opaque... so...??  Perhaps that argument is somewhat spurious??"

Sivaram Arabandi, MD, MS
Ph:  216.374.2883


On Jun 20, 2011, at 3:34 PM, Chime Ogbuji wrote:

On Monday, June 20, 2011 at 3:08 PM, Andrea Splendiani wrote:

sorry to jump on this thread like this...

To be honest, I'm kind of concerned by the insistence on semantic-opaque
I am as well and I have been for some time.
I understand the reason for them,

Actually, I would be interested in hearing the reason for them enumerated, because I have had a hard time imagining what could possibly offset the (significant) impact on readability that it has on biomedical ontologies.  The barrier is already high for non-logicians and non-semantic web aficionados to use biomedical ontologies.  Why set it any higher?

-- Chime


No virus found in this message.
Checked by AVG - www.avg.com<http://www.avg.com>
Version: 10.0.1382 / Virus Database: 1513/3715 - Release Date: 06/20/11

Andrea Splendiani
Senior Bioinformatics Scientist
Centre for Mathematical and Computational Biology
+44(0)1582 763133 ext 2004


No virus found in this message.
Checked by AVG - www.avg.com<http://www.avg.com>
Version: 10.0.1382 / Virus Database: 1513/3716 - Release Date: 06/20/11
Received on Tuesday, 21 June 2011 00:08:46 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:52:47 UTC