Re: UR* schemes [was: 3rd IETF/W3C coordination call...] from Keith Moore on 1999-12-01 (uri@w3.org from December 1999)

From: Keith Moore <moore@lust.indecency.org>
Date: Tue, 30 Nov 1999 22:42:50 -0500 (EST)
To: "Tim Berners-Lee" <timbl@w3.org>
cc: "Keith Moore" <moore@cs.utk.edu>, uri@w3.org, moore@lust.indecency.org
Message-Id: <199912010341.WAA11817@lust.indecency.org>
Tim,

I'm not going to get into a long argument here about the use
of URLs in protocol parameters.  Many of the issues involved 
either have nothing to do with the URI list, or would need to 
be discussed in a much wider community for the discussion to 
be meaningful.  But IMHO the goals are highly dubious and the
result you seem to want would be extremely difficult to acheive.

OTOH, there are probably *some* protocol parameters for which it would 
be appropriate to either define a mapping between those parameters 
and URIs, or to delegate a portion of that parameter space to 
be filled by suitably encoded URIs or DNS names.   Content-types
might be an example of such a parameter space, though I have
reservations about this.  But these would need to be evaluated on 
a case-by-case basis by folks knowledgable in each of the protocols.

(e.g. issues regarding assignment of content-types should probably be 
discussed on the <ietf-types@alvestrand.no> mailing list.)



As for URNs (which have nothing whatsoever to do with the above),
I find the argument against URNs fairly vacuous.  URNs are intended
as a syntatic convention for the benefit of humans.  Saying that there 
should not be URNs is a bit like saying that there should be no 
syntactic conventions in human language, e.g. that there should not be a 
convention that adverbs in English end in -ly.  

Note that the -ly convention is useful even though not all adverbs 
end in -ly (and that not all words ending in -ly are adverbs) -- 
the convention still works often enough that it assists our brains 
in their interpretation of adverbs (both in assigning meaning
to the word and associating it with the appropriate adjective or verb) 
when we hear or read them.  Similarly, the URN:  prefix convention is 
intended to allow our brains to distinguish them from other URIs 
and to think "this is highly likely to be a stable reference".

Of course, the fact that syntactic conventions are often useful in 
languages doesn't mean that the URN: convention is useful enough to 
be worth the trouble, but saying that there should be no such convention
is a bit like saying that people shouldn't be able to coin new words.
People do coin new words, and new language conventions, and they succeed 
or fail based on whether other people adopt those words and conventions.

URLs *can* be stable references, but there is no way to know whether
they are stable or not by looking at them.  Furthermore, the inclusion
of temporal things in a URL like protocol-specific prefixes, company-
specific domain names, and human readable directory and file names 
decrease the liklihood that a URL can have a stable meaning over a 
long period of time - because old protocols (including http)
will eventually fall into disuse and new ones will be written, companies 
will merge or split up, domain names will change hands, directories will 
be reorganized, file and directory names will contain words that 
change meaning (or contain trademarks that change hands), etc.  
Some but not all of these can be fixed by adding a level of indirection.

If you consider the lifetime of a work to be several decades
or even centuries (the CNRI folks used to say 150 years because that 
was something close to the then-current limit for copyright)
and you want to keep the same resource identifier associated
with the work for that time, you need to avoid including things
in that identifier that will damage the ability of the identifier
to be associated with that resource over several decades.  
Since the meaning of almost any word can change, you want to avoid
loading the resource identifier with temporal meanings.

(no matter how hard you try, you can't get cern to send redirects
for your web pages after you move if they don't want to do so...
even though the mechanisms to to this exist in http.  the only 
way to solve this problem is to not include cern's name
in the name of your web pages in the first place.)

These were the justifications behind having a slightly different
syntactic convention for URNs than for other URIs.  Different
people can have different opinions on whether URNs are sufficiently
valuable to be worth the trouble.  Meanwhile, folks who don't want to 
assign URNs to their resources are not forced to use them.  But given
the design goals and constraints for URNs, the syntax that emerged is 
a fairly straightforward and obvious result.

Of course it remains to be seen whether people accept the URN:
convention - this will ultimately be decided based on whether
or not it is useful.  My guess is that it will initially be useful 
to a certain, fairly limited, sets of people (e.g., the Internet
equivalent of library catalogers) and after it is established in 
certain circles the convention might eventually enjoy wider use 
(just as happened with ISBN numbers).  But URNs were never intended 
to replace URLs (except perhaps early on before the URN problem 
was well articulated), there is very little overlap between the uses 
of the two, and folks who try to use URNs to replace URLs
are likely to end up disappointed with the results.


> Trying to sum up my concerns about the whole URN development:
> 
> 1. It is based on a lack of society to currently to support the
>   facilities for persistence (including soft redirection) in HTTP.

false.  there were other motivations for URN development; and many 
of the problems addressed by URNs cannot be solved by indirection
in HTTP. 

> 2. The lack of URN dereferencing protocol may lead to a reinvention of HTTP,
>   with consequent unnecessary and serious complication of the web;

it's extremely unlikely that HTTP will be reinvented anytime soon;
it has such mindshare that any new protocol for the same design 
space as HTTP would have a very difficult time displacing HTTP, 
even if it were significantly faster, less complex, better adapted 
to wireless, whatever.  in other words, it would be difficult for 
something to replace HTTP even if it were better than HTTP for 
HTTP's design space.

Indeed, the big problem in applications these days is to keep people 
from using HTTP (and LDAP) for purposes for which they are not well 
suited.

OTOH, any protocols designed specifically for URN dereferencing would 
have vastly different requirements and therefore different characteristics 
than HTTP.  this is not reinventing HTTP, it's inventing a new protocol
for a new purpose.  this seems entirely appropriate if sufficient value 
can be added to make the extra complexity worthwhile.

> 3. The suggestion that URNs will solve the persistence problem is
>  causing serious archive maintainers to ignore their duty of persistence
>  while "waiting for URNs", and thus making the problem much worse.

well, URNs are here now, so they don't need to wait any longer.

true, there has been a somewhat-persistent (hopefully not long-term-
persistent) belief that URNs would prevent URLs from ever becoming 
stale, end nuclear war, cure world hunger, etc.  The fact that people
believed things like this only means that URNs were misunderstood.
While such understanding is unfortunate, it doesn't mean that URNs
are useless or that they cannot survive this misunderstanding. 
A great deal of misunderstanding results every time a new net protocol
is widely hyped in the trade press, but usually the hype fades after
a while and folks can use (or not use) those protocols according to 
whether it actually makes sense to do so - rather than based on
their buzzword compliance or lack thereof. 

> 4. The actual registry of URN names is in fact more or less at the
>   "/etc/hosts" stage, and has yet to reinvent the hierarchical structure of
>   DNS and the social structrue of ICANN

this is an interesting point.

as URNs aren't supposed to contain much that is human meaningful
(it interferes with their purpose) having hierarchy in URN NIDs
doesn't necessarily seem like a good idea.   The main purpose of the
NID is to separate URN space into multiple sub-spaces, both
for distributed control, and to allow grandfathering of existing
stable identifiers under URN space.  In this sense it is vaguely
analogous to TLD space in the DNS.  Some delegation of NID-space
(not in an explicitly hierarchical fashion) might be useful, but 
probably only if whoever was running the NID registry were 
exercising too much control.  However, given the purpose and anticipated 
uses of URNs it seems unlikely that control of URN NID space would 
be anywhere nearly as controversial or hotly contested as control 
of DNS TLD space, at least not in the near term.

At some point it might be appropriate to delegate control of 
URN NID space to ICANN rather than IANA,  precisely because of
the ICANN social structure. But it's difficult to see why this would 
be useful - or what difference it would make - in the near term.  
IIRC, there are already visibility, public review, and multiple levels 
of appeal for URN namespace review.  And ICANN currently has the 
contract to run IANA, so the same people would handle the assignment 
in either case.


> 5. In this "waiting", web sites fail to learn to use HTTP redirects properly
>  (eg http://www.isi.edu/div7/iana/)  and browsers fail to distinguish a
>   "found" lookup and a "moved" redirection, making understanding of the
> semantics  of HTTP in the community worse.

in general there seems to be a lag - often measured in many years -  
between the time that protocol features are developed and the time
that it takes for implementors to get the subtle features right,
and even more time before users to figure out how to use them properly.
(and the two happen in strict order - users won't utilize them
until vendors get their implementations right)

so if "serious archive maintainers" aren't providing persistence
for HTTP-accessible resources using HTTP-accessible features
(to the degree that this is feasible)  then my guess is that it's
because they don't yet understand how to use those HTTP features,
and that this is largely because the browsers don't implement them 
properly.  And I don't see how the presence or absence of URNs has much 
effect on whether browser vendors implement redirects correctly for HTTP.


Keith
Received on Wednesday, 1 December 1999 17:07:12 UTC