Re: report: URN Architecture Meeting at University of Tennessee, Oct 30-31

Roy T. Fielding (fielding@avron.ICS.UCI.EDU)
Fri, 01 Dec 1995 01:13:53 -0800


To: Keith Moore <moore@cs.utk.edu>
Cc: urn@mordred.gatech.edu, uri@bunyip.com
Subject: Re: report: URN Architecture Meeting at University of Tennessee, Oct 30-31 
In-Reply-To: Your message of "Wed, 29 Nov 1995 04:35:51 EST."
             <199511290935.EAA16530@wilma.cs.utk.edu> 
Date: Fri, 01 Dec 1995 01:13:53 -0800
From: "Roy T. Fielding" <fielding@avron.ICS.UCI.EDU>
Message-Id:  <9512010113.aa22154@paris.ics.uci.edu>

and now that I have slightly more time (another all day meeting)...

> Okay, I'll try a different tack.  I'll see if I can state this
> one as a design tradeoff.

Good idea. I'll just rephrase the tradeoff from my perspective, and I
hope you'll see why I object so strongly to the syntax.

> If we give the client a way to differentiate between URN "schemes",
> 
> on one hand,
> 
>   + we can make potentially make resolution more efficient, because 
>     the client can customize its search path on a per-scheme basis.
>     (If the client doesn't know the "scheme", it's not a matter of 
>     not being able to resolve the URN, it's a matter of having to look
>     in potentially more places before it finds it.)

Right.  In other words, if we don't provide a scheme, we prevent any client
making their resolution more efficient on a per-scheme basis, regardless of
the local conditions that can be known only by that client's owner.
 
>   + it makes it easy for people to understand how other naming schemes
>     are incorporated into URN-space 

Right.  It makes it possible for vendors to design that incorporation
into their implementations, rather than get surprised by it as systems
evolve.

>   + we make some unhappy people happier, which may bring us closer to
>     agreement.  (I'm not being facetious here; it's often better to 
>     have a standard with some "imperfections" than no standard at all.)

Amen.

>   + to the extent that it's useful for a client to know the
>     syntax and semantics of a URN (for reasons other than resolution),
>     having the "scheme" name be visible makes that possible.

Right.  In other words, if we don't provide a scheme, we prevent any client
from taking advantage of the syntax and semantics of a URN even when those
semantics are meaningful for that scheme.  For example, if the canonical
form of a URN is dependent on the scheme, then URNs can be defined for
legacy naming systems that allow case-sensitive names.

And I'll add:

    + it allows multiple naming mechanisms to be designed, developed,
      and enhanced independently of the standardization process

    + it does not lock us into a single syntax for the NA+OID for all time

    + *if* relative URNs are found to be useful, we don't have to throw
      away all the existing URN work to use them

    + it lowers the entry barrier for the introduction of URNs into
      existing technology

> ...
> on the other hand, 
> 
>   - it increases the probability of client configuration error

Yep, but doesn't assume that we know the client user's needs better than
the client user.

>   - if schemes tend to imply particular resolution protocols,
>     they decrease persistence of URNs 

Yep, that would be bad, but it is also easy to avoid.  I would want any
URN scheme to reflect a non-protocol name, like "oid:".

>   - schemes increase the probability that the client cannot resolve
>     a URN because it doesn't know about the "scheme", which in turn
>     reduces interoperability

Increases the probability of reduced interoperability, yes.  But that
is true of any URI scheme until my URI Resolution Table idea becomes
a standard (if ever).

>   - they make it less likely that URNs have global scope in practice
>     (since the interpretation of a URN is up to the client, and it
>     tempts clients to make special interpretation based on the "scheme")

Yes, but it also allows a URN to have local scope when only local scope
is desired (or even possible, as is the case with an emerging technology).
The Web could be created one site at a time because any site can be created
independent of any other.  Sure, that means some names will fail the test
of persistence, so people who want guaranteed name persistence will have
get their names from a guaranteed naming organization.

> ...
>> You are assuming that there will be only one URN scheme.
> 
> No, I'm assuming all URNs will have a prefix that gives the 
> client the ability to recognize it as a URN, and the minimum 
> information necessary to use it.  (by "use it", I essentially 
> mean "find a resolver for it").

Same thing. Honestly, that is how the Web is implemented; just look at
any client library: my libwww-perl, CERN (now W3C) libwww, Guido's
modules for python, etc.

You can implement URNs natively in libwww-perl simply by creating
a perl module called www<scheme>.pl which includes a request() procedure.
The library will load it dynamically when it encounters a URI with
that <scheme>.  The library doesn't care whether the scheme is associated
with a protocol or not -- it uses the scheme to select a resolution
mechanism and thus any new scheme can be added without affecting any
other part of the library.

Extensibility is one of the most important aspects of Web technology.
This does not necessarily mean that all vendors have succeeded in
implementing this extensibility; it only means that the design does not
prevent them from implementing an extensible system.  Those of us who
know better have done better.  If the implementation of IETF-sponsored
URN's reduces the current URI extensibility, then I will not allow them
to become a Web standard, even if that means divorcing Web standards from
the IETF process [which I would personally hate to do].  However, that
has never been necessary, because every time we have polled the vendors
on this matter they have always supported a more extensible design.
The problem is that the change was never made to the URN specification
because the authors didn't follow-through, forgot, or just plain disagreed
with the rough consensus.  That is not following the IETF process, which
is why I am sick of repeating myself every time a new URN draft is
produced, and is the primary reason why so little progress has been
made over the past three years.
 
>> Any resource may be identified by multiple names and/or locations.
>> Any resource which is "the current version of X" is also "that specific
>> version of X" -- both of these concepts can and should be assigned names
>> if it is determined (by anyone) that such a name is useful.  Thus, any
>> system that purports to define URNs must also allow multiple names per
>> resource.
> 
> Yes.  
> 
> (But I've never thought of a URN as being tightly bound to a "resource" 
> ... it's bound to a "definition".  So a URN for today's weather map and 
> a URN for the weather map on 11/29/95 would be different because they 
> *mean different things*, and it doesn't matter that under certain 
> circumstances they could refer to the same resource.  But this is 
> independent of whether URNs have "schemes".)

It matters if the question asked is "have you seen the contents of
this map before?", or "by which name should I refer to this resource when
I put it in in a hotlist/bookmark file?".  You are right though in that
this example does not highlight the need for schemes.

>> Requiring that all URNs have the same properties (i.e., case insensitive,
>> references an entity fixed-for-all-time, etc.) would make it impossible
>> to represent resource names as URNs.  
> 
> Depends on what you mean by "resource names".  I have always assumed
> that URNs must be able to subsume other naming systems that have the
> same basic properties -- global uniqueness, persistence, transcribibility,
> etc., but not that URNs must be able to subsume any kind of resource name 
> (such as a URL or a file name).  Now if the other naming systems that
> we need to subsume into URN space are really so diverse that we cannot
> define a common "umbrella" syntax and registry and clients have to 
> be aware of the differences in their syntax in order to "use" them...
> well, I'm tempted to suggest that we try to solve a narrower problem.

I think that's reasonable, but the name "URN" refers to the larger problem
of location-independent resource names.  Solving a narrower problem is fine
provided that it does not prevent others from solving other parts of the
problem, which means that you must have a way to differentiate between
solutions, and thus a scheme other than "URN:" is necessary.

> But I'm not yet convinced that we need to support this kind of
> diversity...perhaps you could supply some firm examples?

What I am saying is that unless you can *prove* to me that we will never
want to support that diversity, you cannot make that choice for others.

>...
> It's not as if everyone uses the word "scheme" in the same way.
> (sorry, couldn't resist...it's one of my favorite quotes.)

Cute, but there is only one way to use the word "scheme" when referring
to the characters preceding a Uniform Resource Identifier.  I am not
interested in redefining the name associated with two proposed standards
and an installed base of >20million applications.

>> A scheme defines the syntax and
>> semantics associated with the remainder of the identifier.  It does not
>> define the resolution protocol; some identifiers have a scheme name which
>> matches a protocol name because that is the most meaningful name to
>> associate with a locator for which the ultimate resolution process defaults
>> to using that protocol.  In other words, the Knoxville proposal is using
>> the scheme "URN".
> 
> The Knoxville proposal doesn't define the syntax of the name past the NA.

Yes it does -- it defines that it is opaque and case-insensitive and
only includes a restricted set of characters.

> The Knoxville proposal doesn't define the semantics of the name at all;
> we narrowed our scope for the purpose of that two day discussion to
> specification of the name and how to find resolution servers for that
> name, and used the term URN to refer to the part of the "resource 
> identifier" that we chose to work on.

If you perform a case-insensitive comparison of two Knoxville identifiers
and find them to be equal, what does that mean?  Semantics.

>> World-Wide Web user agents use the identifier scheme to determine the
>> resolution mechanism (NOT protocol -- mechanism is that *thing* which is
>> responsible, within or outside the client, for resolving identifiers of
>> that particular identifier type -- it may use any protocol defined by
>> the user or vendor for resolving that scheme, including a protocol defined
>> on-the-fly through retrieval of a script).  
> 
> While I agree with you in principle, this is not the case in general.
> It's certainly possible to add a layer of indirection between a URL
> and its servers.  But since the web wasn't designed with a standard
> layer there from day one, it's somewhat difficult to add one now and
> see it universally deployed.  (doesn't mean it's not a good idea --
> it's just difficult)

Wrong.  The design has been there since day one -- in fact, it preceded
the original definition of Universal Document Identifiers, which preceded
the creation of the URI WG for the purpose of standardizing those identifiers.
Schemes were designed to support extensibility of names by allowing the
library resolver module to be determined by scheme name.  It was also in
libwww-perl since day one.

What has not been there is support for the user to enable that extensibility.
Right now, only a programmer can do that.  However, this can be added now
without any change to the URI syntax and with no entry barrier for
implementation on clients.  All I need to do is finish writing my paper. ;-)

>> Uniform Resource Names is a category of identifiers, referring to those
>> that identify a resource independent of its network location.  It is wrong
>> to use "URN" as a scheme name for the same reason it is wrong to use
>> "URL" as a scheme name.
>> 
>>    I CANNOT USE ANY IDENTIFIER THAT BEGINS WITH "URN:"
> 
> Sure you can.  You can use URN: as easily as HTTP:.

Actually, I can't use HTTP either, since schemes are required to be
lowercase.

> I don't really care what these things are called.  I do care about
> not defining lots of new URI prefixes such that the client has to
> know about each one of them individually, or so that URNs get confused
> with URLs.  So in response to your all-caps statement, I might say:
> 
> 	I CANNOT USE MORE THAN ONE NEW URI PREFIX
> 
> although that, of course, is also false.  I do, however, think it's
> highly undesirable to keep extending things in this way.

If you implement a "truly great" URN with a particular scheme, and
it turns out that you are right in that your "truly great" URN is
sufficient to solve the URN problem in general, then nobody will bother
to use some other URN that is "less great".

If, however, you are wrong in that some other URN syntax is better than
that proposed, or if some other type of URN is necessary to solve the
bits of the URN problem which you did not consider "important enough",
then allowing multiple URN schemes to exist will allow the proof to be
determined by implementation and successful deployment, not by
pre-standardization posturing.

If this is just a difference of opinion between "extensibility is bad"
and "extensibility is good", then there is no point is continuing this
discussion.

>> Which means, obviously, that I will forbid the use of such an identifier
>> in any system which I design or am responsible for standardization.
>> That is what I've said consistently for over 1.5 years now, that is what
>> I will recommend to the W3 Consortium members, and that is the objection
>> I will continue to raise every time this is discussed within the IETF.
>> 
>> Is that clear?
> 
> In the IETF at least, you have no authority to forbid any such thing.

I wasn't referring to IETF standards.  URN is not an IETF standard.
URN isn't even an IETF working group.  Right now, URN isn't even out
of the early research phase.  I do have the authority to forbid the
use of bogus URNs in any system *I* design, and in any system in which
*I* am responsible for standardization (e.g., the W3C use of URIs).
To the extent that my responsibility overlaps with that of the IETF,
I defer to the IETF.  However, the IETF's responsibility *never*
extends to systems that are not yet implemented.  Mine does.

> We make decisions by rough concensus, but the concensus of the group can 
> override any individual.

Only if that consensus is polled for on the working group mailing list
and the results are represented in the WG documents.  In the entire
history of the URI WG, the only time that the "URN:" prefix *ever*
obtained consensus was at a meeting at a bar during the Houston IETF
meeting -- yes, that's right, it wasn't even a legitimate decision
of those in attendance at the real meeting.

> I personally would think it silly for us to develop this new kind
> of identifier that we have been calling a URN all along, and use
> any prefix for that identifier other than URN:.  But if "silly" doesn't
> cause any implementation or operational problems, you might be able 
> to get the group to go along with you and use some other prefix.
> 
> On the other hand, if we end up defining lots of new URI prefixes, 
> we will have been wasting our time for the past 4 years or so, because
> we will have effectively gained nothing over normal URLs.  That's 
> not silly, that's tragic.

Since when is the existence of only one URN scheme the sole advantage
of location-independent names?  The only thing that has been wasting our
time for the past 4 years or so is this insistence on defining an identifier
which is fundamentally incompatible with all existing practice.  I am trying
to stop yet another waste of time before it starts again.  If existing
practice will not be a concern of some future URN WG, then there should
not be any URN WG in the IETF.

>> Hell, ALL
>> EXISTING IMPLEMENTATIONS OF URIs DEPEND ON THE EXISTENCE OF SCHEME NAMES.
> 
> This isn't a justification for anything in particular.  The reason we're
> doing this little four-plus year exercise is that "existing implementations
> of URIs" aren't sufficient for our needs.

NO -- URLs aren't sufficient for our needs.  There is nothing insufficient
about the URI architecture and there is no technical reason to justify
a change from that architecture.

> (C'mon.  Does the "scheme" really have to be the part of the URI before 
> the first colon?

Yes.

> Do URNs really have to share a common syntax
> with URLs... down to including the path structure?

No, but they must be usable within the same URI structure.

> If you really want
> URNs to be persistent, you don't put any semantically loaded
> information in them at all...certainly not information that reflects
> the internal structure of multi-file documents.)

I have seen no implementation that proves such a theory, though I have
never suggested that all URNs must contain structural information either.
I believe there is no harm in allowing both to coexist.

>> If you don't support the identification of resources that may already
>> be on your local disk, identified within a personal database of resources
>> located in a real-world bookshelf, or located within the user's local
>> University library, then you have failed to solve the URN problem.
>> You don't have to define these resolution mechanisms -- you just have
>> to make them possible with minimum difficulty.
> 
> Actually, we do support the identification of such resources, but not
> with names that indicate where the resources are stored.  After all,
> a resource originally in my personal database could eventually become
> available to the entire world...should the resource name then change and
> then invalidate all of the references to it?
>
> But I could certainly configure my client to search my personal resource
> database, my mail folders, etc.  before searching the DNS registry.

According to what constraints?  Do you want every query to search all
available sources?  Or, do you want the sources to be ordered and targeted
according to the likelihood of their knowledge about the resource?
If you know a name is associated with a University Technical Report,
don't you want your client to search the TR database before the
library of congress?  If so, how does the client get configured for
such preferences without looking at the opaque identifier after the
"scheme:"?

The fact is that you cannot anticipate all the needs that I or anyone
else may eventually have for URNs, so don't assume you have.  Provide a
syntax that is extensible not because it will be, but because you cannot
be sure it won't need to be.


 ...Roy T. Fielding
    Department of Information & Computer Science    (fielding@ics.uci.edu)
    University of California, Irvine, CA 92717-3425    fax:+1(714)824-4056
    http://www.ics.uci.edu/~fielding/