Review of draft finding on URNs, Namespaces and Registries from Booth, David (HP Software - Boston) on 2006-08-09 (www-tag@w3.org from August 2006)

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Wed, 9 Aug 2006 07:43:20 -0400
To: <www-tag@w3.org>
Message-ID: <EBBD956B8A9002479B0C9CE9FE14A6C20B93BA@tayexc19.americas.cpqcorp.net>
Review of http://www.w3.org/2001/tag/doc/URNsAndRegistries-50.xml

General Comments:
1. Great topic!  I'm glad to see the TAG addressing this.  The document
is well considered and makes many excellent points.  

2. As a whole, the document (and Section 3 in particular) does not go
far enough in admonishing the use of myRIs.  It feels like it is
slashing at branches of the issue rather than cutting it down at its
trunk.  As (I think) I demonstrated in http://dbooth.org/2006/urn2http/
, the capabilities of http URIs are virtually a direct superset of
myRIs.  I would not go as far as to claim that I have *proved* this
assertion, because there are some (in my opinion minor) ways in which
they are inherently different, and hence myRIs could still be viewed as
advantageous.  (And BTW I just updated the document to explicitly list
the ways I could think of: 
http://dbooth.org/2006/urn2http/#differences .)  However, I think the
simple myRI-to-http conversion recipe that I showed provides very
convincing evidence.   Thus, unless someone can show how my analysis is
flawed, I think the TAG's advice should be: "Do not create new myRIs
unless you can demonstrate that those inherent differences are so
important to your application that they outweigh the enormous installed
base of HTTP URIs."

2. Simple examples would help.  The points are often stated in very
general, vague terms.  Simple examples would add a lot of clarity.

3. The term "naming authority" is used several places where I would have
expected "domain owner" or "URI owner" to appear, which I think would be
much clearer.  Please define the term and/or consider using a different
term.

Comments on specific sections:

4. Sec 2.4 Location Independence:
The "http: fact" section seems to be making two points: (1) that an http
URI may be used for identifying things that do not have representations,
such as concepts in an ontology; and (2) for URIs that do have
representations, server-side techniques can be used to make the URIs a
little less dependent on their locations within a particular site.  

The authors may be intending the explanation to make broader points than
these, but if so, they are not at all obvious.  In particular, the
explanation does not seem to address the case when a representation is
*intended* to be made available for a URI, but the location of the
representation is *not* indicated by the URI.  

For example, http://xyzpurl.example.org?foo may be used purely as an
identifier even when representations are *intended* to be offered, but
not necessarily via a GET on that URI.  For example, a specification
governing the "http://xyzpurl.example.org?" prefix (using the technique
described in http://dbooth.org/2006/urn2http/ ) can be used to associate
(potentially changing) locations with the identifier in order to
retrieve representations.

I think it would be helpful to bring out this point because a common
(erroneous) assumption is that if you wish to supply a representation,
and you are using an http URI, then the representation must served from
that http URI.  I think it would be helpful to dispell this explicitly.


To summarize:

	- An http URI can identify an information resource; and
	- Representations of that resource can be served; but
	- They need not be served from that same http URI; and
	- The location(s) for retrieving them need not be indicated
	  in the URI, either logically or physically!

5. Sec 2.7 Flexible Authority:
This section is too cryptic.  Thus, I'm not sure I fully understand it,
but I'll comment based on what I *think* it means.

By suggesting proxies and redirection, this only seems to be addressing
the use of URIs as locators rather than as pure identifiers.  "Complex
encodings of dependent and delegated naming authority" do not need
proxies or redirection to be implemented with http URIs (though proxies
and redirection certainly can add even more value).  

For example, a specification governing the "http://xyzpurl.example.org?"
prefix (again using the technique described in
http://dbooth.org/2006/urn2http/ ) can indicate the conventions for
namespace authority delegation that have been encoded in any URI
beginning with that prefix.

6. Sec 3 The value of http: URIs:
This section seems to be saying that the http scheme's "two-part
approach to identifying resources" is the reason http is better than
myRIs.  This does  not seem correct to me, for two reasons:

	a. In the first paragraph, I don't it's correct to say 
	that http URIs use "a hierarchical syntax for distinguishing 
	resources which share the same owner".  The syntax is just 
	the syntax specified in RFC3986.  Whether URIs within a 
	domain are treated as being hierarchical would depend on 
	the policies of the domain owner, wouldn't they?  I don't
	think there is inherently anything hierarchical about the
	URI syntax, though it often is convenient to treat it as
	hierarchical.

	b. I don't think you have clearly spelled out the correct 
	factors in why http URIs are better.  You mostly
	have them, but they are a bit muddled.  The key factors 
	I see are the following, and only the last is unique to http:

	- http URIs can act as globally unambiguous *identifiers*;

	- http URIs can potentially act as a *locators* for 
	retrieving a representation or other metadata related to
	the resource;

	- the http URI syntax is flexible enough that URI owners can
	easily layer additional syntactic and semantic conventions
	onto sub-portions of their URI spaces, including the ability
	to delegate	minting authority; and

	- the enormous installed base of http.  

7. Sec 3 The value of http: URIs:
The first paragraph mentions ". . . resources which share the same
owner".
I think this meant to be talking about URI ownership -- not resource
ownership.  I can make up a URI (in my URI space) that names you.  I own
the URI; I do not own you.

8. Sec 3 never actually says when one should use myRIs.  It merely says
that one should "consider carefully" before creating myRIs.  As
guidance, I think this falls short.  It should explicitly list the
circumstances under which myRIs are justified, which, as I mentioned
above and explained in http://dbooth.org/2006/urn2http/ , are probably
never.  As far as I can tell, they would have to be around the inherent
differences that I list at 
http://dbooth.org/2006/urn2http/#differences .

9. Sec 4.1:
The clause "but this must be on a subset of the entire set of namespace
names" is confusing.  I think it should have said: "but namespace owners
are not required to do so and not all do".

10. Sec 4.2:
This section is entitled "Identification" but the discussion is all
about trying to dereference a namespace URI.  Also, the discussion about
whether the URI is "clickable" is a little misleading.  It seems to be
implying that one should never even try to dereference the URI, to avoid
wasting 5-10 seconds.  But this is certainly wrong advice.  One *should*
try dereferencing the URI if one is trying to learn more about it,
because it *may* be dereferenceable and there *may* be authoritative,
useful information available from it.

11. Sec 4.3:
This section needs rewriting.  The conclusions are not quite correct and
the arguments are not solid.  The question of whether a URI persists as
an *identifier* is the question of whether it continues to identify the
same *resource* -- not whether it is dereferenceable or whether the URI
owner still exists or continues to mint more URIs.  The resource
identified by a URI can indeed change, though it should not.  It is up
to the URI owner to say what resource is associated with a particular
URI.  Thus, if the organization that owns a URI changes its mind in 10
or 20 years and decides to associate a different resource with the URI,
it is free to do so.

The points of this section *should* be: (1) that software using
namespace URIs must not depend on them being dereferenceable; and (2)
whether a URI continues to always identify the same resource depends on
the URI owner, and in either case (urn versus http) it is Oasis in this
example, thus there is no difference in persistence.

12. Sec 4.4:
This section also needs rewriting.  The arguments are muddled and not
even quite the right arguments.   The point of this section should be
that the Oasis URNs are not dereferenceable at all, whereas an http URI
*might* be dereferenceable to useful, authoritative information.  Thus,
one is no worse off with http URIs, and potentially much better off.

13. Sec 4.5:
This section mentions:
[[
A provider of a identifier must specify how the identifier will be used
in each specific sub-context of their XML language, whether it is
intended as an identifier, a location, or both.
]]
This is true of other names in XML, such as element and attribute names.
It is not true of URIs.  I think this sentence can just be deleted.

14. Sec 4.5:
This section also mentions:
[[
Any use of an identifier, or any datatype for that matter, in an XML
document has the same issues. 
]]
Other identifiers in XML (such as element and attribute names) *do* have
other, additional issues.  However, I think this is a red herring
anyway.  I think this sentence can simply be deleted.

15. Sec 5: Case Study: Persistent Document Location:
I ran out of steam on this section, but on quick reading, this section
seems to go into far more detail than necessary in attempting to make
the essential point, and this obscures the essential point.  For
example, the first paragraph says:
[[
[XRI] observes that changing the organizational structure represented in
the URI, for example to
http://newdept.agency.example.org/docs/govdoc.pdf, or the path
structure, for example to
http://newdept.agency.example.org/documents/govdoc.pdf, breaks access.
]]
But one obvious solution (already pointed out in Cool URIs don't change,
http://www.w3.org/Provider/Style/URI ) is to not put the organizational
structure in the URI.  

David Booth, Ph.D.
HP Software
dbooth@hp.com
Phone: +1 617 629 8881
Received on Wednesday, 9 August 2006 11:43:54 UTC