Identifiers (was: Access and query TF) - probably off-topic from Graham Klyne on 2011-06-23 (public-prov-wg@w3.org from June 2011)

From: Graham Klyne <GK@ninebynine.org>
Date: Thu, 23 Jun 2011 09:11:12 +0100
To: "Myers, Jim" <MYERSJ4@rpi.edu>
CC: Luc Moreau <L.Moreau@ecs.soton.ac.uk>, public-prov-wg@w3.org
Message-ID: <4E02F520.1070506@ninebynine.org>
Myers, Jim wrote:
>> (The only thing I arguing against is a defined construction of
> identifiers to reflect
>> this usage - per Web architecture, URIs whould be opaque strings.)
> 
> OK - I'm not sure we need to go there for PIL either, but Kunze (John A.
> not Steven as I originally posted) does make such an argument for ARKs.
> (The ARK spec works hard to keep the subparts opaque beyond this one
> affordance though. see
> https://confluence.ucop.edu/download/attachments/16744455/arkcdl.pdf) 
> 
> His core arguments revolve around the issues a) that curators come and
> go on shorter timescales than data, hence a mechanism to find other
> copies of data/metadata is needed, and b) that if we're worried about
> curators disappearing, any mechanism for maintaining a map to copies
> that relies on them won't work well. Thus separating curator from thing
> in an actionable 'curator/thing' identifier such that just having the
> 'curator/thing' string (URL) is enough to let you search for and
> identify other copies (e.g. identifiers of the form 'thing' and
> 'curator2/thing'), is at least a rational choice despite the concerns
> about opacity. 
> 
> Again, while we may want to punt on anything like this, my suspicion is
> that the same arguments are applicable to provenance.

While the article you cite makes many good points, I find myself unconvinced 
that the long-term naming problem will be solved by yet another naming scheme.

As the article says:
[[
A founding principle of the ARK is that persistence is purely a matter of 
service, and is neither inherent in an object nor conferred on it by a 
particular naming  syntax. The best an identifier can do is lead users to those 
services.
]]

Which echoes something that has long been held true in the Web community
[[
What sorts of URI change?
URIs don't change: people change them.
]]
-- http://www.w3.org/Provider/Style/URI.html

I've had many discussions through the years relating to the use of HTTP-or-other 
URI scheme for persistent URIs, and for me the epiphany came when I realized 
that the core issues were entirely non-technical.  It's the social contract (and 
the resources to back it up) that count.

For example, HTTP alone has no associated social contract, but HTTP in some 
domains (w3c.org, purl.org, dx.doi.org, etc.) does, and in some domains people 
areusing such identifiers quite happily.  Another example: one of the main 
problems that I experienced with URNs was that the social contract expected was 
too demanding (or appeared to be too demanding), so many potential uses were 
eschewed by designers (I was involved in some efforts to use URNs more widely 
within the IETF - e.g. http://tools.ietf.org/html/rfc3553, which I saw as an 
attempt to bridge the IANA registry-based approach with the W3C URI-based 
approach to naming).

A personal observation is that any "address" can serve as an "identifier", and 
the only thing that can prevent any "identifier" from serving also as an 
"address" is the lack of a suitable resolution infrastructure.  Several years 
ago, DDDS was defined (but not widely adopted) for implementing URN resolution 
via DNS (http://tools.ietf.org/html/rfc3401, et seq).  So we must look back to 
social contracts to distinguish, not technology.

Indeed, it's all about the roles, not the objects - haven't we been here recently :)

Working recently with the Bodleian Library Service on a project for data 
preservation (http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL), who can 
justifiably claim to be thinking about preservation on a scale of decades and 
centuries, we learn that they don't regard *any* form of identification as being 
reliable for the long haul - all that counts is the content, and the means to
locate/discover that content which are expected to evolve over time.

So, yes, I think we should punt on any issues of structured identifiers and 
leave that for the archivists to fight out :)

A big part of this is that punting on opaqueness doesn't mean that individual 
archivists can't use structure in identifiers (isn't that exactly what happens 
with the Handle system in DOIs?), and that over time we might expect several 
such systems to come and go - it is to my mind an orthogonal concern.

#g
--
Received on Thursday, 23 June 2011 09:47:43 UTC