Re: IDs + 5; everybody - 10 from Jonathan Rees on 2007-07-16 (public-semweb-lifesci@w3.org from July 2007)

From: Jonathan Rees <jonathan.rees@gmail.com>
Date: Mon, 16 Jul 2007 10:04:36 -0400
To: "Phillip Lord" <phillip.lord@newcastle.ac.uk>
Cc: "Mark Wilkinson" <markw@illuminae.com>, "Alan Ruttenberg" <alanruttenberg@gmail.com>, Michel_Dumontier <Michel_Dumontier@carleton.ca>, public-semweb-lifesci <public-semweb-lifesci@w3.org>, "Benjamin Good" <goodb@interchange.ubc.ca>, "Natalia Villanueva Rosales" <naty.vr@gmail.com>
Message-ID: <3cff5e070707160704h553b426fuc90014030152dd98@mail.gmail.com>

Let me try to review what's going on here, since Mark W and others
have reasonably asked why we're putting so much effort into the URI
question.

The W3C HCLS SIG was created according to a charter [1] that specifes
that the "The Interest Group will provide guideline[s] on how best to
identify HCLS resources for use in the Semantic Web."

I was drafted to edit these guidelines, so I am obliged to put effort
into this. Interest group members are also obliged to put effort into
this, I think.

The interest group decided early on to open up its discussion list to
non-W3C members so it could enjoy input from groups that are not W3C
members. So non-HCLS contributors are not obliged to do anything, but
are acting either defensively or altruistically.

I interpret "how best to identify" to mean only identification inside
of RDF triples, since this is a semantic web activity.

Using common identifiers should be a goal whose merits should be
obvious to anyone who's been paying attention. Without this, query
writers need to locate mappings from one naming scheme to another and
refer to these mappings inside queries. I think this is what Rod Page
and others are advising. Not a disaster, as he and others have pointed
out, only a tragedy.

Even if there are multiple identifiers for something, we still have to
recommend which one an uncommitted party should use given a choice, I
think.

As a W3C-chartered group we have to give W3C dogma (such as use of
http: URIs as identifiers instead of or in addition to locators) a
fair shake. The dogma is in contradiction with the use of LSIDs, so
the dissonance has to be resolved thoughtfully. We know AWWW and the
TAG are wrong about some things, but we have to work with them as best
we can.

Since we already have many essential resources identified by http:
URIs, the question is not whether to rename those, but rather how to
make http: URIs behave in a more civilized way, and whether there are
situations in which to recommend creation of new LSIDs, DOIs, or other
non-http identifiers for use in RDF.

LSIDs have a few things going for them, so we need to try to benefit
from the LSID experience. If other requirements (to be determined)
prevent HCLS from recommending the minting of LSIDs, then (my opinion)
we should consider reproducing LSIDs' positive features in the http:
URI space (a problem that we've made some progress on). This may look
like reinventing the wheel, but it isn't, since LSID has already done
some of the inventing. It may look like unnecessary replication, but
it's not really, since we're already committed to the http: space and
all the issues that LSID addressed are issues there as well.

The same remarks apply to handles, DOIs in particular.

I don't think this task (of determining good URI minting policy,
deterministic description access, clarification about versioning,
etc.) ought to fall on HCLS, as it is basic semweb infrastructure.
Unfortunately we're the ones being most badly burned by lack of
rational solutions, and W3C seems otherwise unable to do much about
these problems. (Please correct me if I'm wrong - I don't feel I have
very good perspective on the W3C situation, and am always being
surprised by discussions and developments [e.g. 3] I didn't know
about.)

I appreciate everyone's contributions, especially those that respect
other parties in the debate. The diversity of viewpoints will help us
draft the best possible recommendations. No one will completely like
the document, I think, but we've got to try our best to get the
largest consensus possible for the best technical solution. If anyone
pays attention - and I think they will - the result will be
acceleration of semweb application in health care, life sciences, and
other disciplines, and we'll get all that new biology sooner.

Jonathan

[1] http://www.w3.org/2001/sw/hcls/charter
[2] http://www.w3.org/TR/webarch
[3] http://www.w3.org/TR/2007/WD-powder-grouping-20070709

Received on Monday, 16 July 2007 14:04:41 UTC