Re: Critical Comparison of URN Proposals from Daniel LaLiberte on 1995-07-17 (uri@w3.org from July 1995)

From: Daniel LaLiberte <liberte@ncsa.uiuc.edu>
Date: Sun, 16 Jul 95 20:15:39 CDT
To: msm@ansa.co.uk, uri@bunyip.com
Message-Id: <9507170115.AA09711@void.ncsa.uiuc.edu>
> From: Mark Madsen <msm@ansa.co.uk>

>            A Critique of Existing URN Proposals

Thanks for offering this critique.  It is a fair amount of work for
anyone to have studied all the proposals.

> Abstract

> This document criticises existing URN (Uniform Resource Name)
> proposals in the light of generality, extensibility, and general
> futureproofing.  The idea is to draw upon the best characteristics of
> the existing proposals so as to converge on an acceptably functional
> and nonrestrictive draft specification for both URN syntax and
> resolution schemes.

A worthy goal, but I would not be surprised if there were no convergence
though multiple schemes start being used.  This would not necessarily 
mean failure.


> To these requirements could be added the necessity not to make any URN
> scheme into a straitjacket for future Internet development.
> Fulfillment of this requirement will clearly only be possible if the
> issues relating to URN construction are separated from those relating
> to resolution of URNs into other classes of Uniform Resources.

Can you expand on this?  I believe the separation is good too, but
the issue is maybe not quite as clear to me.

> 3.1 The Path URN Scheme

> The path scheme is based on the idea of an hierarchical name space,
> with naming authorities being responsible for specified subtrees of
> the name space.  The problem with this, as with any other hierarchical
> scheme, is that of management of the levels near the root.  Experience
> with DNS (Domain Naming Scheme) shows that this can be a serious
> problem when weaknesses in the distribution of structure ar exposed by
> use (as is presently the case with the .com subhierarchy).  

To contrast a hierarchical scheme with a flat scheme, consider if the
hierarchy were only one level deep.  Then you would have a flat scheme.
The same weakness you describe for a hierarchical scheme applies to
a flat scheme.  If a particular part of a flat name space is overused,
it is overused no matter whether it has subdivisions in a hierarchy.

> Since
> there are expected always to be many more resources in the Internet
> than machines, the management problems can be expected to be severe.

In the path scheme, there would not be an entry in the DNS tables for
each URN.  Rather, there would be an entry in the DNS tables for
each resolver that handles some subspace of names.  So the management
of the path info in DNS tables would not be much more than the 
management of hostnames.

> Management issues are not addressed in the path scheme proposal.

We need to spell these things out, but it seemed more important to
convince people of the value of the resolution scheme since resolution
happens an order of magnitude more frequently than management activities.

> The reason for the hierarchy built into the path scheme is that it
> builds in the association between (sets of) naming authorities and
> resolution services.  URN resolution is therefore an in-principle
> terminating process, which is a positive point in favour of the
> scheme.  In practice, failures will isolate entire subtrees of the
> hierarchy from resolution.  DNS service failures are typically
> softened by the practice of maintaining caches of all lookups: 

Caches will help, but moreover, redundant DNS name servers will provide
the reliability in the remote resolution process.  How much of a problem
do we currently have with entire subtrees of DNS failing?  Even so,
if a subtree of DNS fails that is being used for path names, it is likely
that the named documents under the subtree would also be unavailable, so
there is no additional loss.

> A stronger criticism of the path scheme is that the hierarchical
> process of resolution ties each resource's URN to a particular
> resolution service in perpetuity.  

False.  As I have indicated several times now, even though a hierarchical
path is included in the path URN, it need not be used in the resolution
process.  In fact, we suggest a handle server could be used as the
root level fallback resolution service.  The handle server would hash the
whole string, ignoring the hierarchy, and map it to a URL, or whatever.

> Furthermore, the worst-case
> scenario for resolution of a URN could involve as many resolution
> services as there are components in the path of the URN under this
> scheme.

You might be surprised to learn that hostname resolution has just as
many resolution steps, one for each component of the hostname.
Currently, for path resolution, we are forced to make explicit calls to
the local DNS resolver for each component of the path, whereas hostnames
are handled by the local resolver itself, and it makes the explicit calls
to remote name servers.  Path resolution turns out to be pretty swift
(Michael Shapiro can give you some numbers) but it could be made faster
if it turns out that lots of paths are being resolved.

> Tying URN construction and assignation to resolution in any way is a
> stringent limit on the freedom of future Internet engineering to
> exploit new technologies.  

Fortunately, this does not apply to the path scheme.

> However, it is easy to impose an hierarchical
> resolution structure on a general namespace, so long as that namespace
> is not itself tied to some other restrictive resolution scheme.

It is easy to ignore the hierarchy in a hierachical namespace, but
it is more difficult to conjure up a hierarchy where there is none
to start with.

Maybe the way to look at the path scheme spec is that it defines
a hierarchical namespace, and furthermore, it suggests a resolution
protocol that clients and servers can use, if they choose.  If you
are interested in not being constrained by a particular resolution
protocol, then just keep in mind that another protocol could be used.
But we (the community) still need to decide on some protocols to actually
do work.  The path scheme resolution protocol is highly scalable, fast,
etc, and that is why we decided to write it up and promulgate it.

> 3.2 The x-dns-2 Scheme

> This is described in internet draft draft-ietf-uri-urn-x-dns-2-00.txt
> "x-dns-2 URN Scheme" by Paul Hoffman and Ron Daniel.

> This scheme is subject to the same criticism as the path scheme about
> the way that the syntax incorporates the resolution service in the URN
> itself.  However, this scheme will coexist reasonably well with other
> schemes, because it encodes the "x-dns-2" scheme name in the URN as
> well.  However, this means that resolvers unfamiliar with this scheme
> convention may have difficulty in resolving such a URN.

Clearly clients must either understand how to invoke the resolution
process, or know a proxy that does.  This applies to all URI schemes.

> Resolution in this scheme is handled by doing a standard DNS lookup to
> find the resolution authority, and requesting direct resolution of the
> URN from them.  Whether DNS can scale to cope with the number of
> resolution requests that could conceivably be generated in such a
> scenario seems doubtfuul.  

I don't know why this would be any more of a problem than an http
URL resolution.

> This also exposes the fact that the scheme,
> like the path scheme, relies on the naming authority and resolution
> service being closely linked, an assumption which is unlikely to
> remain true in the long term, as commercial and specialised resolution
> services are set up.

Not a problem for either scheme, since the parent node in the DNS
hierarchy can just point to where the child has moved.  Or if you don't
want to use resolver named by the URN, use the one you want anyway, if you
know where to find it.  BTW, out of all these commercial and specialized
resolution services, how would you know which one to use and do a lookup
in a reasonable time?

> 3.3 The Handle Scheme

> This is described in internet draft draft-ietf-uri-urn-handles-00.txt
> "The Handle System" by William Harms and David Ely.

> The handle scheme shares many characteristics with the aformentioned
> URN schemes.  There is a namesapce hierarchy, in which there are extra
> features, such as the ability for naming authorities to create
> subsidiary naming authorities.  

Creation of subsidiary naming authorities should be assumed to exist
in other hierarchical schemes.

Notice that the resolution process associated with handles does not use
the hierarchical info in handles.  If clients *did* make use of this
info, to find the resolver for the naming authority, then this would
essentially be the same as the x-dns-2 scheme, in functionality.
If the resolver that should be used if found dynamically by looking
from the top down until the most specific resolver is found, then
this would essentially be the same as the path scheme.  This paragraph
is a summary of the major functional differences between the three
resolution protocols.

> There is a global handle server, which
> is distributed, and local handle servers, which gives a more flexible
> model of how resolution may proceed.  However, these servers are again
> both naming authorities and resolution services, with all the
> limitations that implies.

Not quite true.  A resolution service may be the primary service used
by several naming authorities.  And again, if you don't like the
resolution service, ignore it.

> The handle syntax itself seems too complicated for naming, and
> sepcifies that the handles carry along with them sets of typed
> structured information, and corresponding administrative information.

I don't think so.  Maybe you are thinking of the info that a handle
resolves into.

> The proposal also considers such issues as handle administration tols,
> access to services through firewalls, and cacheing of handles.  While
> important, these issues are orthogonal to the construction of naming
> scheme frameworks, and should be left to other documents.

You wanted management discussed for the path scheme, but not here?
How the resolution protocol works with firewalls, and the scaling issues
relative to caching are very significant.

Daniel LaLiberte (liberte@ncsa.uiuc.edu)
National Center for Supercomputing Applications
http://union.ncsa.uiuc.edu/~liberte/
Received on Sunday, 16 July 1995 21:15:54 UTC