Re: URN Resolution Paths Considered Harmful from Daniel LaLiberte on 1995-06-29 (uri@w3.org from June 1995)

From: Daniel LaLiberte <liberte@ncsa.uiuc.edu>
Date: Thu, 29 Jun 95 01:47:06 CDT
To: sollins@lcs.mit.edu
Cc: FisherM@is3.indy.tce.com, uri@bunyip.com
Message-Id: <9506290647.AA03043@void.ncsa.uiuc.edu>
Thank for the involved response, Karen.  Just what I need to stay awake
at 1am.

	From: "Karen R. Sollins" <sollins@lcs.mit.edu>

	You're right - I was mapping name of resolution service to resolution
	service.  Since I tend to think in an object-oriented world, a "name"
	(think URN) for a resolution service will always name the same
	service, although it may move around (to my way of thinking, unless
	the names for resolution services are not long-lived, globally unique,
	etc.)  Did you have some other naming scheme (namespace and resolution
	mechanism) in mind?

Yes, the name of a resolution service in a URN ought to behave much like
a URN itself.  The intention for the path scheme is that the name of
a resolution service always maps to *logically* the same resolution
service in terms of its behavior, but it may be a completely different
service, or it may be absorbed by a service higher up in the path name
space never to be seen again.

The distinction is analgous to a function that computes factorial versus
one that looks up the results in a table.  They achieve the same effect
in completely different ways.

	Anyway, I see a slight variation on the problem still.  Once a set
	of URNs have the name of a particular name resolution service embedded
	in them, that group of URNs will always be tied to the same resolution
	service as each other.  I'm reluctant for us to choose a path where
	that kind of assumption about service in the unknown future is
	restricted.

This is true for the path scheme as described in the current internet draft.  
We have since then extended it with a fallback mechanism such that if
the original resolution service (or its logical equivalent) does not want
to deal at all with the resolution of some set of URNs, the client can
effectively go back to higher-level resolvers. (Details forthcoming.) 
The top-level resolver will therefore accumulate older but still
valuable URNs - something like the handle server would be good at this
level.

	I agree with what I expect is an underlying motivation here to be able
	easily to find a reliable, or just good, but most preferably
	authoritative resolution service to resolve a URN.  

Another important motivation is to allow the resolution of names to
happen close to where the named resource lives, at least initially.
It is better to put both under the same administrative unit if
possible, since that will put the motivation for continuing to resolve
the name in the right place. We also, of course, deal with what happens
when the named resource moves.

	In addition, I hope we don't choose a path that will restrict
	generality in the future.  

Generality is my motivation as well.  That's why I am reluctant to impose
any unnecessary semantics on URN resolution protocols.  But that is another
subject.

	In fact, a generalization of the path scheme has some
	interesting features.  The path scheme takes a long string composed of
	potential substrings.  It divides the string into a prefix substring
	and the remainder and hands the remainder to the resolver named by
	the prefix.  That resolver eithers decides that it can resolve the
	remainder or it strips a new prefix to use as the name for the next
	resolver to which will be handed the new remainder.  And so on.  

All correct.  The new version is a little different, but this is close
enough for the purposes of your argument.

	This can be generalized further, by allowing each resolver to map
	the remainder either to a location or to a pair consisting of a
	resolver and string that aren't necessarily simply substrings of the
	string it was handed.  

So for example assume the resolver for the B component of
path:/A/B/C/D/doc.html is given the remainder "C/D/doc.html".  B chooses
the next resolver in whatever way it desires and also computes some
string for that resolver.  In general, this turns out to be just
a sequence of arbitrary redirects which must be followed to discover
what each subsequent resolver will do.  It is powerful, but time consuming.

One of the strong motivations of the path scheme is to allow locality of
reference for scalability, so that if a client (or caches near the
client) already knows where the resolver for, say, /A/B/C is, it can go
there directly.  A sequence of redirects does not allow direct access.

	This helps address the problem that I was suggesting above.  If two
	URNs have the same resolver name embedded in them, when that resolver
	goes out of business and the URNs that would have gone to it now go to
	a variety of other services, as long as something knows about that
	dispersement, that something can be put in place of the original
	resolver service, causing the subsequent strings to be rewritten to
	reflect the new state of affairs.

That is also the essense of the fallback mechanism I mentioned above.
The something that knows about the dispersement would be a higher level
resolver, or the root if none other.

	This is a scheme that at least works, but it leads to permanent
	inefficiencies because URNs cannot change; they are immutable, so all
	the URNs with that particular resolution service name will always be
	an indirection, once the original has gone out of business and its
	business dispersed.  

Correct, as I also argued above.  It's thoughts like these that make me
think that we need to encourage indefinite retirement of old names to
be eventually replaced by new, more direct names.  The old names would
continue to work but resolution of them would be slower than for the
new names.

	The bottom line problem here is that the dispersion may not be
	algorithmic in the URN, but rather on some other basis that isn't
	known at the time of resolution (and may both be different for
	different sets of resources and may change with time).  There may be
	a different way of handling each and every URN that was originally
	handled by a single resolution service.

This problem would seem to be true no matter what the URN scheme.
Can you think of a scheme that avoids this problem?  Does the solution
involve having *no* resolution service name in a URN?  What's left is
only an opaque string - who will resolve it?  If there are N possible
resolution services out there, do we try each of them?  How large does
N have to be to handle the load?  Is this essentially the handle service?

	I think you were also suggesting that although the resolver name may
	be embedded in the URN it need not be used.  

Yes, this is a fallback mechanism completly external to the particular
URN scheme in which the client chooses some promising resolution service(s).
Consider again that a path URN might be resolved by a handle service which
just hashes the whole string and looks up whatever info is associated with
it.

Karen, I thought in our last discussion that you were arguing that
it is essential to support such external fallback mechanisms, and
therefore we need to know in advance that a URI is a name that lives
forever, so we know that it is legitimate to attempt to resolve it
in whatever way we may choose.

	The problem here is that
	if is there, I suspect that users (applications, clients, whatever)
	will come to depend and expect from early on that the resolution
	service name must be correct.  This is more a matter of human nature
	forcing us in a direction that has significant drawbacks but apparent
	short-term payoffs.   I think that we as the designers of the scheme
	need to be careful to be as visionary and long-sighted as possible.

Absolutely.   But I disagree that there is a problem simply because URNs
might have historical resolver names embedded in them.  People, or software,
will learn to ignore irrelevant details.

	We won't catch all the pitfalls, but we should try our best to avoid
	those we know about.  (That's part of what makes the process
	challenging and often drawn out as more and more issues come to mind.)

Yes, each new twist is like the next bend in the river.

	Anyway, to make a long story shorter, I agree with the desire to make
	this stuff efficient, but I believe that we shouldn't pay too heavy a
	price for that and should understand what the price is.

Agreed.


Daniel LaLiberte (liberte@ncsa.uiuc.edu)
National Center for Supercomputing Applications
http://union.ncsa.uiuc.edu/~liberte/
Received on Thursday, 29 June 1995 02:51:41 UTC