Re: report: URN Architecture Meeting at University of Tennessee, Oct 30-31 from Keith Moore on 1995-11-29 (uri@w3.org from November 1995)

From: Keith Moore <moore@cs.utk.edu>
Date: Wed, 29 Nov 1995 04:35:51 -0500
To: "Roy T. Fielding" <fielding@avron.ics.uci.edu>
Cc: Keith Moore <moore@cs.utk.edu>, urn@mordred.gatech.edu, uri@bunyip.com
Message-Id: <199511290935.EAA16530@wilma.cs.utk.edu>
> > I don't know how I can make this any clearer:
> > 
> > 1. I (and others who were at Knoxville) have said, repeatedly, that
> > the client chooses how to resolve a URN.  (One of our oft-stated
> > design principles was "a client can and will do whatever it wants to".)
> 
> The client can only do this if it has some way to differentiate between
> URN schemes.  

Okay, I'll try a different tack.  I'll see if I can state this
one as a design tradeoff.

If we give the client a way to differentiate between URN "schemes",

on one hand,

  + we can make potentially make resolution more efficient, because 
    the client can customize its search path on a per-scheme basis.
    (If the client doesn't know the "scheme", it's not a matter of 
    not being able to resolve the URN, it's a matter of having to look
    in potentially more places before it finds it.)
  + it makes it easy for people to understand how other naming schemes
    are incorporated into URN-space 
  + we make some unhappy people happier, which may bring us closer to
    agreement.  (I'm not being facetious here; it's often better to 
    have a standard with some "imperfections" than no standard at all.)
  + to the extent that it's useful for a client to know the
    syntax and semantics of a URN (for reasons other than resolution),
    having the "scheme" name be visible makes that possible.

    (however, I'm not sure whether what you call "semantics" of a URN
    was included in the Knoxville URN at all; we had agreed that 
    things like service requests were not part of the URN but 
    that there was a need to be able to specify those things in
    a standard manner along with a URN)

on the other hand, 

  - it increases the probability of client configuration error
  - if schemes tend to imply particular resolution protocols,
    they decrease persistence of URNs 
  - schemes increase the probability that the client cannot resolve
    a URN because it doesn't know about the "scheme", which in turn
    reduces interoperability
  - they make it less likely that URNs have global scope in practice
    (since the interpretation of a URN is up to the client, and it
    tempts clients to make special interpretation based on the "scheme")

I don't know of any way to objectively decide whether the benefits
of having a "scheme" outweigh the disadvantages.  My experience with
multiprotocol email leads me to believe that having one "scheme" 
that is flexible enough to subsume all others, and putting the
details of how to resolve names in a network-accessible database,
is far preferable to expecting each client to know the details
of each "scheme".

So it's really a case of how much rope to give the clients, and
how much information to expect them to know in order to do
their jobs.  I don't mind giving them the rope as long as they
don't really need to use it all that often.

I could personally live with having a "name space identifier" (NSI)
in the URN as long as (a) it's not strictly tied to a protocol or 
registry, (b) the resolution of the URN doesn't depend on the client 
knowing details of the NSI portion of the URN, (c) a registry can 
delegate resolution of URNs on at least a per-(NSI+NA) basis (and 
ideally, to smaller sub-ranges of that space).

But somehow I get the impression this isn't what you're getting at.

> You are assuming that there will be only one URN scheme.

No, I'm assuming all URNs will have a prefix that gives the 
client the ability to recognize it as a URN, and the minimum 
information necessary to use it.  (by "use it", I essentially 
mean "find a resolver for it").

> Any resource may be identified by multiple names and/or locations.
> Any resource which is "the current version of X" is also "that specific
> version of X" -- both of these concepts can and should be assigned names
> if it is determined (by anyone) that such a name is useful.  Thus, any
> system that purports to define URNs must also allow multiple names per
> resource.

Yes.  

(But I've never thought of a URN as being tightly bound to a "resource" 
... it's bound to a "definition".  So a URN for today's weather map and 
a URN for the weather map on 11/29/95 would be different because they 
*mean different things*, and it doesn't matter that under certain 
circumstances they could refer to the same resource.  But this is 
independent of whether URNs have "schemes".)
 
> Requiring that all URNs have the same properties (i.e., case insensitive,
> references an entity fixed-for-all-time, etc.) would make it impossible
> to represent resource names as URNs.  

Depends on what you mean by "resource names".  I have always assumed
that URNs must be able to subsume other naming systems that have the
same basic properties -- global uniqueness, persistence, transcribibility,
etc., but not that URNs must be able to subsume any kind of resource name 
(such as a URL or a file name).  Now if the other naming systems that
we need to subsume into URN space are really so diverse that we cannot
define a common "umbrella" syntax and registry and clients have to 
be aware of the differences in their syntax in order to "use" them...
well, I'm tempted to suggest that we try to solve a narrower problem.
But I'm not yet convinced that we need to support this kind of
diversity...perhaps you could supply some firm examples?

> Requiring that all URNs within a
> given URN scheme have certain minimum properties is useful, but not
> sufficient to contain all of the semantics any particular user would
> assign to any particular resource.  Allowing a resource to be identified
> by multiple URN schemes, with each such URN scheme defining its own set
> of relevant semantics, is the only way to sufficiently *identify resources*
> using a simple identity string.

I think you're mapping the problem differently than we did in the
Knoxville meeting.  Obviously you'd have to add some information to
the Knoxville URN to state what you want to "do" with it.  We
recognized the need for this in the Knoxville discussions, but 
we didn't try to specify it...we said it wasn't part of the URN 
but in some cases would have to be supplied with the URN.  

So the "identity string" for the URC of resource FOO might consist of
the Knoxville-style URN for FOO along with a request for a URC,
while the "identity string" for the resource itself might consist
of the Knoxville-style URN for FOO along with a request for the resource.
Sometimes a reference to FOO will want to indicate the URC, other
times it will want to indicate the resource itself, and other times
it just wants to indicate FOO without being more specific.

Likewise, there might be a service request for "most recent version"
or "version 1.3" associated with FOO...though trying to put versions
in a service request is probably a rathole.  (In BFD/RCDS the
revision history of the resource can be listed listed in its 
description (think "URC").  Clients can peruse this description
and select whatever version of a resource they want using the LIFN
of the  resource, but there's no explicit request to "get the latest 
version of FOO".)


> > 2. I have also said, repeatedly, that the URN syntax that we defined
> > is NOT tied to DNS, that other registries besides the DNS registry
> > are expected.  It is essential that the syntax does not imply DNS -- 
> > if for no other reason than to allow transitions to other registries 
> > in the long term.
> 
> If the only way a client can determine the type and semantics of an
> identifier is to perform a DNS query on some part of that identifier,
> then the identifier is tied to DNS.

(a) we don't expect that to be true even in the short term.  we feel
    sure that there will be "local" registries in many environments
    (which might provide access to resources which cannot be allowed
    to leave that environment, say for security reasons; and might
    also provide access to a local cache).  some of us also envision 
    that might will be the net.equivalent of "rare/old book services"
    that you consult after you've looked in the default location,
    perhaps for a higher price and/or longer delay.

(b) unless DNS lasts a LOT longer than I think it will, it certainly
    won't be true in the long term.  The client benefits if there is
    only one registry, but for transition purposes we need to  make
    sure we can move from one registry to another, so we must assume
    from day one that there will be multiple registries.

> > 3. URN: in the Knoxville proposal is NOT a "scheme".  URN: is a prefix 
> > that allows clients to identify URNs in text and to distinguish URNs 
> > from other kinds of URIs.  The Knoxville proposal doesn't have "schemes",
> > because -- to the extent a "scheme" dictates a resolution protocol --
> > the inclusion of a "scheme" impairs the longevity of the URN.
> 
> Then you don't understand what a scheme is.  

    ``When I use a word,'' Humpty Dumpty said in a rather scornful tone,
  ``it means just what I choose it to mean--neither more nor less.''
    ``The question is,'' said Alice, ``whether you _can_ make words mean
   so many different things.''
    ``The question is,'' said Humpty Dumpty, ``which is to be master--
   that's all.''

It's not as if everyone uses the word "scheme" in the same way.
(sorry, couldn't resist...it's one of my favorite quotes.)

> A scheme defines the syntax and
> semantics associated with the remainder of the identifier.  It does not
> define the resolution protocol; some identifiers have a scheme name which
> matches a protocol name because that is the most meaningful name to
> associate with a locator for which the ultimate resolution process defaults
> to using that protocol.  In other words, the Knoxville proposal is using
> the scheme "URN".

The Knoxville proposal doesn't define the syntax of the name past the NA.

The Knoxville proposal doesn't define the semantics of the name at all;
we narrowed our scope for the purpose of that two day discussion to
specification of the name and how to find resolution servers for that
name, and used the term URN to refer to the part of the "resource 
identifier" that we chose to work on.

Once we added additional components to form the "resource identifier",
I suppose we would be defining both syntax and semantics.  But since
we didn't do that, perhaps the Knoxville URN is not a "scheme" after 
all? :)

> World-Wide Web user agents use the identifier scheme to determine the
> resolution mechanism (NOT protocol -- mechanism is that *thing* which is
> responsible, within or outside the client, for resolving identifiers of
> that particular identifier type -- it may use any protocol defined by
> the user or vendor for resolving that scheme, including a protocol defined
> on-the-fly through retrieval of a script).  

While I agree with you in principle, this is not the case in general.
It's certainly possible to add a layer of indirection between a URL
and its servers.  But since the web wasn't designed with a standard
layer there from day one, it's somewhat difficult to add one now and
see it universally deployed.  (doesn't mean it's not a good idea --
it's just difficult)

> Uniform Resource Names is a category of identifiers, referring to those
> that identify a resource independent of its network location.  It is wrong
> to use "URN" as a scheme name for the same reason it is wrong to use
> "URL" as a scheme name.
> 
>    I CANNOT USE ANY IDENTIFIER THAT BEGINS WITH "URN:"

Sure you can.  You can use URN: as easily as HTTP:.

I don't really care what these things are called.  I do care about
not defining lots of new URI prefixes such that the client has to
know about each one of them individually, or so that URNs get confused
with URLs.  So in response to your all-caps statement, I might say:

	I CANNOT USE MORE THAN ONE NEW URI PREFIX

although that, of course, is also false.  I do, however, think it's
highly undesirable to keep extending things in this way.

Using "URN" (even in our discussions) is dangerous because it means 
different things to different people, but without using it we 
couldn't communicate at all.  Given that we've worked so hard to
agree on what the word URN means, why should we give it a new name?


> Which means, obviously, that I will forbid the use of such an identifier
> in any system which I design or am responsible for standardization.
> That is what I've said consistently for over 1.5 years now, that is what
> I will recommend to the W3 Consortium members, and that is the objection
> I will continue to raise every time this is discussed within the IETF.
> 
> Is that clear?

In the IETF at least, you have no authority to forbid any such thing.
We make decisions by rough concensus, but the concensus of the group can 
override any individual.

I personally would think it silly for us to develop this new kind
of identifier that we have been calling a URN all along, and use
any prefix for that identifier other than URN:.  But if "silly" doesn't
cause any implementation or operational problems, you might be able 
to get the group to go along with you and use some other prefix.

On the other hand, if we end up defining lots of new URI prefixes, 
we will have been wasting our time for the past 4 years or so, because
we will have effectively gained nothing over normal URLs.  That's 
not silly, that's tragic.

> >> All you have done is define a single scheme-uber-alles called "URN".
> >> That is not desirable, reduces flexibility and robustness, and standardizes
> >> mechanisms that have no implementation experience on a global scale.
> > 
> > URN: is not a scheme.  And you have failed to justify your other attacks.
> > In particular, you have not explained why any of the following is 
> > undesirable, inflexible, or non-robust:
> > 
> >   + a common prefix and NA space for all URNs
> 
> See above.  And this is at least the fourth time I have provided sufficient
> reason and argument for why a common prefix and NA space for all URNs
> is both unnecessary and undesirable [see the mailing list archive]. 
> Not once has ANYONE come up with any proof that a single URN scheme is
> necessary and sufficient to encompass all resource names.  As far as I'm
> concerned, this discussion is closed until such time as that proof is given.

As far as I'm concerned, you haven't produced a reasonable counterexample.

> >   + resolution services for URNs are advertised in one or more
> >     global registries. clients need not be configured to resolve
> >     URNs on a per-scheme basis; they can simply consult one or more
> >     of the registries to see which services/protocols are available. 
> >     (clients can special-case lookups for part of the name space
> >     if they want to; but the ability to resolve a URN doesn't depend
> >     on them doing so.)  
> 
> And how does the client get "configured to consult one of the registries"?

By default, it's shipped to point to whatever registry is in vogue when
the client was built; the site or user can customize it based on local
needs and practice but IT WORKS OUT OF THE BOX for most users in
most environments.

> The WWW mechanism for doing this depends on the existence of scheme names.

Yes, but that mechanism makes it very difficult to deploy new "schemes".
(and a URL "scheme" isn't the same thing as a URN "scheme", at least,
not to everybody)

> Relative URL parsing depends on the existence of scheme names.  

We're talking about URNs, not URLs.  And if you have URNs you don't need
relative URLs.  There are enough potential problems with extending Relative 
URLs to the URN world that I am very dubious of requiring "relative URNs" in 
URN space.

> Hell, ALL
> EXISTING IMPLEMENTATIONS OF URIs DEPEND ON THE EXISTENCE OF SCHEME NAMES.

This isn't a justification for anything in particular.  The reason we're
doing this little four-plus year exercise is that "existing implementations
of URIs" aren't sufficient for our needs.
 
> > Nothing about our proposal requires the client to use the URN registries.
> > But if we were to design a scheme such that a client NEEDS "to define, 
> > without reference to any network, how identifiers are to be resolved", 
> > THAT would be undesirable.
> > 
> > As for being able to "resolve" a URN without being connected to the network...
> > I don't know what this means.  Either the client has access to the external 
> > services it needs to make use of a URN, or it doesn't.  If the client doesn't
> > have access to those services, the URN isn't very useful to the client
> > except for comparison with other URNs.
> 
> You obviously haven't read the references I posted earlier.
> Here they are again:
> 
>   http://www.acl.lanl.gov/URI/archive/uri-94q4.messages/0093.html
>   http://www.acl.lanl.gov/URI/archive/uri-94q4.messages/0101.html
> and
>   http://www.ics.uci.edu/pub/ietf/uri/draft-ietf-uri-roy-urn-urc-00.txt

I've read all of these, multiple times.  Please consider that other
people have different ways of mapping out the solution space -- just
because they have picked different architectures doesn't mean that
they aren't trying to solve the same problems you are trying to solve,
in different ways. 

(C'mon.  Does the "scheme" really have to be the part of the URI before 
the first colon?  Do URNs really have to share a common syntax
with URLs... down to including the path structure?  If you really want
URNs to be persistent, you don't put any semantically loaded
information in them at all...certainly not information that reflects
the internal structure of multi-file documents.)

> If you don't support the identification of resources that may already
> be on your local disk, identified within a personal database of resources
> located in a real-world bookshelf, or located within the user's local
> University library, then you have failed to solve the URN problem.
> You don't have to define these resolution mechanisms -- you just have
> to make them possible with minimum difficulty.

Actually, we do support the identification of such resources, but not
with names that indicate where the resources are stored.  After all,
a resource originally in my personal database could eventually become
available to the entire world...should the resource name then change and
then invalidate all of the references to it?

But I could certainly configure my client to search my personal resource
database, my mail folders, etc.  before searching the DNS registry.

Keith
Received on Wednesday, 29 November 1995 04:36:35 UTC