Re: The Path URN Specification

Michael Shapiro (mshapiro@ncsa.uiuc.edu)
Tue, 21 Mar 1995 11:38:15 -0600 (CST)


From: mshapiro@ncsa.uiuc.edu (Michael Shapiro)
Message-Id: <9503211738.AA02162@void.ncsa.uiuc.edu>
Subject: Re: The Path URN Specification
To: uri@bunyip.com
Date: Tue, 21 Mar 1995 11:38:15 -0600 (CST)
In-Reply-To: <9503202219.AA29164@mocha.bunyip.com> from "michael rabinovich" at Mar 20, 95 05:11:40 pm

michael rabinovich wrote:
|
|I also enjoyed reading LaLiberte & Shapiro's proposal. 
|
|However, I think, it has some shortcomings.
|
|(1) It will increase traffic due to DNS requests for partial
|name resolutions. This problem, however, could be avoided if 
|DNS server software could be changed.

When remote DNS servers do recursive queries this reduces traffic.  If
they do iterative queries - (ie make the local servers issue new queries)
the traffic is not reduced. Recursive DNS queries are not recommended by
the DNS RFCs, so this depends on how the various DNS name servers are
implemented/configured.

The path partial name resolution hopefully would have DNS caches that
are near and thus reduce traffic.

However, the use of the TXT records would increase the traffic as well
as size of the caches. For the top of the hierarchy (nodes above
servers) this may mean that we allow missing TXT records (if empty TXT
records require too much space).

michael rabinovich wrote:
|
|(2) More serious problem, I think, is that scalability (a purely
|performance issue) is now not transparent to the end-user. Thus,
|certain non-semantic reasons influence the way resource names are
|constructed. For instance, assume there is a URN/URL resolver with 
|/publishers/uk prefix. Then, as the scale grows beyond this resolver
|capability, we add another server, and give it prefix
|/publishers/uk/hamish-hamilton. What should we do with documents
|that used to be semantically under hamish-hamilton before? Change
|their names? This would violate name persistence.
|
|Also, as a resolution server becomes overloaded and a new server is added,
|it would be natural to split the load in half, rather than to 
|have the parent resolver work at capacity while the newly added
|resolver stay almost idle until enough new names are registered.
|
|The root problem is that this scheme makes the server hierarchy
|part of a resource name. This is semantically confusing; it also
|makes names unstable, as server hierarchies tend to change.
|
|
|What we are doing as part of an on-going project here at the Bell Labs
|is use a URL syntax as a URN, and http as the protocol for talking
|with URN/URL resolution server. For instance, URN from the 
|original draft: <URN:dns:path.net:mitra1234> looks as
|
|http://path.net/resolver/mitra1234.
|
|Resolver is a CGI script that actually does the resolution. (It is 
|just a technicality to remove it from the name).
|
|Then, just as LaLibrter&Shapiro's proposal, the resolver can either
|return a redirect with the actual URL of the document (in fact,
|we use dynamic replication to deal with information server overloading,
|so there are several URLs to choose from).
|
|This scheme also makes it simple to register new names, change
|URN->URL mappings when a document moves, etc. Our prototype server
|has scripts that do that. Moreover, when registering a new document,
|the user can ask for a specific URN, which will be assigned if it
|does not exist already. 
|
|Also, no change to current Mosaic browsers is required.
|
|We deal with scalability issues internally, so that the user is not
|affected. We do allow hierarchical namespace, but the hierarchy
|is determined entirely by the semantics, not the server hierarchy.
|In fact, I anticipate that the flat namespace will be used most often
|(just like we use flat namespace for telephone numbers).
|
|We should have a server  outside the firewall pretty soon, and I will then
|ask people to try it out. In the meantime, does anyone see anything
|immediately wrong with our approach?
|
|Michael Rabinovich.
|


You're right - you can't change the names. But note - the DNS lookup is
only part of the resolution process. The server also has a hierachy
under it. It can also return forwarding information. (The HTTPD
currently has a mechanism for doing this - documents that have been
moved are give new locations in part of the server configuration files
and the server returns "Document moved" to new location headers. This
mechanism is weak in that all the forwarding is in the configuration
file and the server needs to be informed when changes are made).  So,
with server forwarding, documents would not have to be renamed (but
they do get new names as well).

But you point out a real problem. With the path scheme, if you can move
part of the hierarchy to a new server and put the info into DNS, scheme
you have to contact the original server that used to serve the document
and it tells you to contact another (ie returns forwarding info).  Plus
if the ability of a server to serve even the forwarding info degrades -
were stuck.

I think this may be a fundamental flaw with hierarchical names.
However, flat namespaces have their problems.

Lets simplify the discussion and assume that a URN should resolve into
a URL (rather than the document itself). If a URN is a flat namespace
(no hierarchy) then how do we find the server that knows how to
translate the URN into a URL.

One such system is the handles system from CNRI
  <http://www.cnri.reston.va.us/home/cstr/handle-intro.html>.
This system maps a flat namespace to URLs by having a suite of servers
each of which that knows part of the namespace. The URNs are
distributed among the servers to attempt to reduce the load on any one
server.  (The URNs are assigned, perhaps by some hashing mechanism).
If the namespace grows beyond the capacity of the suite of servers, new
servers are added and the namespace redistributed among the new suite.

Clients figure out which server to contact by forming a hash on the URN
and looking up a server in a list of servers (that it downloads from
somewhere - a detail). The client then contacts the server and either
gets the URL, or it is told that it has contacted the wrong server but
is given the name of the server to contact. If it is the wrong server,
the client probably downloads a new server hash list so it has a
correct list. This allows more servers to be added and clients to
adjust to the new number of servers.

However, this means that clients may have to contact servers that are
very remote - or that (a hierarchy of?) caches that replicate the
namespace would have to be introduced.

I would point out that phone numbers are not flat - if you allow them
to be global phone numbers. When you add area-codes and country-codes
you not only get large namespaces but you have a hierarchy that is used
to find the service that knows about the phone number. Also, within
organization you have extensions, which also looks to me like
part of a hierarchy.

-- 
Michael Shapiro                   mshapiro@ncsa.uiuc.edu
NCSA                              (217) 244-6642
605 E Springfield Ave. RM 152CAB
Champaign, IL 61820