URI persistence (was Re: TAG Status Report for Period Ending October, 2011 is now available)

On Thu, Nov 10, 2011 at 1:04 PM, Cutler, Roger (RogerCutler)
<RogerCutler@chevron.com> wrote:
> FWIW, I have some thoughts about URI persistence.  I have in the past tried very hard within Chevron to make our internal URI's less fragile -- and I have been totally unsuccessful.  In my opinion there are two basic reasons for this:
>
> 1 - URI's are being used for two different purposes:  as an identifier and as a locator.  I believe that this fundamentally makes them fragile.  Experience shows that every time we reorganize, which typically happens every few years, the names of the organizations change and the ownership of documents changes in a complex way.  Documents are moved to new sites with names reflecting the organizations.  I'm sorry, there's nothing I can do to stop this very natural human behavior.  I've given up, frankly.  There are also other reasons why documents are moved, but reorgs are the easiest to document and understand.
>
> 2 - Domain names are part of the URI.  This is, of course, related to the issue above, but it's even more fundamental.  Chevron's internal domain name changed from chevron.com to chevrontexaco.com at the time of the merger with Texaco, then back to chevron.com at the time of the merger with Unocal -- and of course there were the texaco.com and unocal.com domains as well.  Most documents were moved to servers with different domain names.  I do not think that this kind of situation is particularly unusual.
>
> I realize that there are technical mechanisms to work around some of these problems, for example with redirects, but my observation is that people don't actually do this very often, and it has seemed to me that in cases where people have tried to do this the solution has been, for reasons I don't understand, fragile.
>
> I don't want to be obnoxious or disrespectful about this, but it is my opinion that URI's are fundamentally "broken" as identifiers and I don't think that there is a "fix".  The problem is just way too fundamental.  I believe that information technology people are taught at their mother's knee that it is a really bad idea to use identifiers for a second purpose.  Y'all got it wrong from the very beginning, and I think it's too late to fix that.  Sorry if this seems unnecessarily negative, but I don't have a solution to suggest.  My observation is that in practice we are slowly migrating to systems that use something other than URI's as identifiers.

I agree that for-profit corporations are poor candidates as hosts for
persistent identifiers because of the importance and inherent
instability of branding and corporate ownership. The cases we were
mainly concerned with in organizing the workshop is organizations that
have a serious stake in long term stability - those whose reputation
depends, to some extent, on maintaining integrity over the long haul,
and who understand that identifiers are primarily a matter of public
interest. There are not many organizations like this; the two
canonical examples we're using are W3C and IDF (although Creative
Commons, Dublin Core, and many others have URIs that need to be
long-term stable). I agree that most domains are unsuitable as
persistence candidates, but do you think that the
http://dx.doi.org/... URIs are a bad bet? What exactly are the threats
to these identifiers? The branding is only in the domain name,
dx.doi.org (notably named after the naming scheme, not after the
administering organization) - they made the decision to use
telephone-number like strings, not mnemonics, in the path, for exactly
the reasons you give.

I guess the workshop is in a particular niche, namely that of
organizations that are committed to persistence and haven't been
convinced not to use http: URIs. Thus the question of domain name
persistence rising to the center of the debate. This is not at all a
simple question, thus the workshop.

Best
Jonathan

Received on Thursday, 10 November 2011 23:36:27 UTC