[URNsAndRegistries-50] Review of http://www.w3.org/2001/tag/doc/URNsAndRegistries-50 from Williams, Stuart \(HP Labs, Bristol\) on 2007-03-28 (www-tag@w3.org from March 2007)

From: Williams, Stuart \(HP Labs, Bristol\) <skw@hp.com>
Date: Wed, 28 Mar 2007 14:56:27 +0100
To: "Henry S. Thompson" <ht@inf.ed.ac.uk>
Cc: "TAG mailing list" <www-tag@w3.org>, "David Orchard" <dorchard@bea.com>
Message-ID: <C4B3FB61F7970A4391A5C10BAA1C3F0D86FF51@sdcexc04.emea.cpqcorp.net>

Henry,

I promised you some review of http://www.w3.org/2001/tag/doc/URNsAndRegistries-50, so here goes:

Firstly on a macroscopic level the document clearly has two halves. The first half appear intended to catalog various characteristics that motivate the creation of new URI schemes or URN namespaces; while the second half (section 4 onward) appears to examine in detail a couple of case studies. I think that the second half would tie more closely to the first if it were to revisit all the characteristics enumerated in the first half (and in the same order) examining each characteristic in the context of the particular case study. Also, the narrative of the first half does a lot of the set-up for the second half - which I think would allow the second half to be shorter.

To be specific:
1st Half Concepts:
Persistence
Standardized
Protocol Independence
Location Independence
Structured Names
Uniform Access to metadata
Flexible Authority

2nd Half Concepts:
[Namespaces use case]
Identification
Persistence *of identifiers* (more comment later)
Derferencability
Erroneous appearance of dereferencability of identifiers
[XRI use case]
Persistent Dereferencability (location independence)
Persistent Identifiers
Protocol Independence

Whilst some of the differences in aspects covered might just be labeling of concepts, I think others are indicative of different coverage. IMO the first half should introduce the concepts and background to each as a means to structure the analysis in the second part - I suspect that was the intent, but I think as a whole the two parts may have drifted appart a little.

[In document order]

Introduction: last paragraph:

Sentence: "They prose new URN (sub-)namespaces or URI schemes and provide registries of instances thereof,..."

On first reading I took the registries to be registries of URN (sub-)namespaces and/or URI (sub-)Schemes and that may indeed be what was meant. However, the sentence can also be taken to mean registries of URI assigned under the said scheme. Suggest a little more precison than the indexical (??) "thereof".

2.1 Persistence

Firstly, as I mentioned this on the telecon. I think that we need to be clear about two possible things being spoken of as 'persistence':

a) that a URI provides access to a set (often a singleton) of immutable representations - ie. the representations are persistent.

b) that the association between a URI and a resource (in the sense of something that yields a coherent set of representations over time that may be taken to be representations of the same thing) is persistent. I believe this is the intent of "CoolURIs".

Currently, the description of the Information Science Community notion of persistence aligns (I think) with a) above - while b) (which i think is what we may mean) is not spoke of at all.

Secondly: I *may* be willing to accept the claim that myRI and http: based identifiers can both be used to achieve the same level of persistence. However that is based on the argument that "forever is a long time" and no scheme or mechanism can persist forever, therefore no scheme/mechanism is durable enough to satisfy a persistence requirement.

However, at a more pragmatic-level most of us *never* get to truely 'own' the URI that we (in some epoch) get to assign to resources. The domain name that I use at home is retained by me at a cost of about £10 every two years. I suspect that I will go on paying that for as long as I am able... but there is a point where my personal interest in those URIs will end - and in general, despite the best of intentions, I cannot guarantee the persistence of the URI/Resource relation (or at least the coherence of the resource over time).

FWIW: There is another view, which I would attribute to Roy. My mental model for this view is of Roy, in the guise of Obi-wan Kenobi, waving a hand and muttering "...these are not the resources you are looking for." This is based in more of an existential arguement that a given resource, whatever it is, manifests as a time-varying series of sets of accessible representations. If it's character changes at some point in time, then the resource is less useful - but it has not changed - because what a given resource is is defined over all future time. In that sense, regardless of scheme the URI -> resource relation is inherently persistent (though potentially of dubious utility).

So, wrt to the last paragraph in section 2.1, I do not the think that the volatility of the 'ownership' of DNS names can be so easily brushed asside. I know form conversations with Tim that W3C (and no doubt others) have made provision for the eventual demise and the ongoing persistence of its identifiers - at least until the lights go out on the planet :-) - but I suspect this by social agreement rather than technical means, which in the long run remains fragile to the vaguaries of human behaviour.

I'm wondering whether it would be good to mention Larry Masinter's tdb: and duri: draft schemes (there is a snapshot of an expired ID at http://larry.masinter.net/duri.html) which I think is interesting wrt to how it deals with the potential changes in 'ownership' of the URI space (though I think Larry would demure on notions of ownership and authority applied to URI).

2.2 Standarized:

Minor gnit: "For example, my employers..." with two authors leaves 'my' a little unclear. Would probably be better to avoid the personal pronoun.

2.3 Protocol Independence:

Re: "All existing myRI approaches in practice specify only one such mapping, to the HTTP protocol."

"All" is a pretty sweeping claim and I am minded of Dan's phrase - "...be careful with the quantifiers."

There are at least two axis here: available specifications; and the deployment of what they specify.

I am not indimately familiar with DDDS (http://www.ietf.org/rfc/rfc3401.txt) and whether "in practice" it specifies "only one such mapping, to the HTTP protocol." IIRC its intent is/was to provide framework to enable the resolution of URNs. [I know nothing of the extent of its deployment and uptake.]

Re: "Protocols which don't allow servers any escape mechanism are thereby pretty much ruled out as transports for retrieval from myRIs (or http: URIs)."

Please could you explain the point being made. I suspect the mention of servers is extraneous and that the point is something to do with extensibility points the definition of protocol specs.

4.1 Context

Re: 2nd para, last sentence: "It is never the case that a URI is simply "found" without a context."

I happen to agree with this (even the "never" which again begs the 'be careful with the quantifiers' comment). I am also aware that this is and had been a hotly debated viewpoint - whether "context of use" has any bearing on what a given URI identifies or is used to denote. Being a little pragmatic I could admit "the Web" as being one fairly large context in which the *intended* referent of a URI is invariant" and so try to have my cake and eat it.

I'm wondering if we can avoid taking the lid off of that particular can in this document - and if not, despite the likely angels and pinhead nature of the seriously discussing this, we will likely have to discuss it in more depth.

4.2 Identification

Last para, last sentence: "The software-only interaction pattern is clearly erroneous..." to what does this refer?

4.3 Persistence of Identifiers

Re the last two para:

"We might imagine a scenario many years down the road where OASIS no longer exists. It would not maintain the oasis-open.org domain name and http: identifiers using that domain would no longer be assigned. Alternatively, OASIS does not produce or mint any new URNs. In either case, the identifiers are not dereferenced so all the existing software works.

In URN and http: scheme cases, the persistence of the identifier is accomplished by the organization. The ongoing existence of the organization does not affect the persistence of the identifiers."

I am at a loss to understand which way this is trying to argue or what to conclude - http: is adequate? or http: is not adequate?

"Persistence of identifiers" is a curious phrase in that "http://www.w3.org/" is a persistent identifier - as an identifier it will always be that identifier - as an identifier it is immutable.

The closing para (2nd quoted above) is true of identifiers, but in a trivial sense. I disagree that it is true of the relation between an identifier and a resource, except in the sense of "...these are not the resources you are looking for." and the resource being whatever it happens to be... ie. an aggregation overall time of the representations available when dereferencing a given URI - but this is not what most people conceive of a being 'the resource'. I can accept and like Roy's formulation of a resource, but under that formulation the resource is what it is overall time such that persistence is then a non-issue. It's just that it is not what I think who are after 'persistence' are trying to accomplish. Wrt to what (I think) they are trying to accomplish... "The ongoing existence of the organization" DOES "affect the persisence of the identifiers." because the relation between the identifier and what (I think) they perceive as the resource can change with changes in DNS name ownership.

4.4 Deferencability

Re: "In all dereferencable identifier scenarios, an identifier must be usable to generate an authority. There may be interactions with multiple authorities to determine the "final" authority for the identifier. The final authority uses the identifier to produce a document."

What does "...to generate an authority" mean?

I don't understand "There may be interactions with multiple authorities to determine the "final" authority for the identifier."

"The final authority uses the identifier to produce a document." should "document" be "representation"?

I have yet to give section 5 and beyond a detailed read.

Hope these comments are helpful

Regards

Stuart
--
Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN
Registered No: 690597 England

Received on Wednesday, 28 March 2007 13:56:58 UTC