The Path URN Specification from Daniel LaLiberte on 1995-03-17 (uri@w3.org from March 1995)

From: Daniel LaLiberte <liberte@ncsa.uiuc.edu>
Date: Fri, 17 Mar 1995 16:58:25 -0600
To: uri@bunyip.com
Message-Id: <199503172258.QAA20216@void.ncsa.uiuc.edu>
Below is the draft of our path scheme specification.  This same
version (modulo format, name, and date) will be submitted as an
internet draft.

For now, you can also get an HTML version at:

    <URL: http://union.ncsa.uiuc.edu/~liberte/www/path.html>

Daniel LaLiberte (liberte@ncsa.uiuc.edu)
National Center for Supercomputing Applications
http://union.ncsa.uiuc.edu/~liberte/

#####################################

The Path URN Specification
**************************

<name to be assigned by IANA> 
Expires ??month day, year?? 

Daniel LaLiberte <liberte@ncsa.uiuc.edu>
Michael Shapiro <mshapiro@ncsa.uiuc.edu> 

This document is also available in HTML at:

  <URL: http://union.ncsa.uiuc.edu/~liberte/www/path.html>

Status of this memo
===================

This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, and
its working groups. Note that other groups may also distribute working
documents as Internet-Drafts. 

Internet-Drafts are draft documents valid for a maximum of six months and
may be updated, replaced, or obsoleted by other documents at any time. It
is inappropriate to use Internet-Drafts as reference material or to cite
them other than as "work in progress." 

To learn the current status of any Internet-Draft, please check the
"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au
(Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West
Coast). 

This Internet Draft expires ??month day, year?? 

Last modified: Fri Mar 17 16:43:14 1995 

Abstract
========

A new "path" URN scheme is proposed that defines a uniformly
hierarchical name space. The resolution of a path URN is a two-step
process: locating the resolution server and locating the resource within
the server. Existing DNS capabilities are used to locate the resolution
server and HTTP is used as the protocol for locating a resource within the
server. 

Introduction
============

Conceptually, the path scheme defines a uniformly hierarchical name
space. A path is a sequence of components and an optional opaque
string. An example path is: 

   path:/A/B/C/doc.html 

Names are assigned by naming authorities that are responsible for a
subtree of the name space, and naming authories may delegate
responsibility to sub-authorities. Each naming authority corresponds to a
name resolution service, which may be shared by several naming
authorities. 

In this document, we first describe the name resolution process
conceptually. This is followed by a detailed description of our (planned)
implementation, the encoding rules, and the discussion of URN
requirements.

The Name Resolution Process
===========================

This section describes the resolution process conceptually but not
completely. See the implementation section for the details.

The name resolution process involves two steps: First we traverse the
path left to right until we find a most-specific server, then we interact with
that server to resolve the remainder of the path name. The server has the
option of returning a redirection to a URL.

The resolution process starts at the path name root located at some fixed,
globally known network address. The root corresponds to a name
resolution service which resolves the first component of a path into the
address of another node. Generally, each node in the hierarchy resolves a
path component into another node at the next lower level. This process
repeats until no more-specific resolver is found.

The name resolver for each node must tell clients whether there is a
more-specific resolver for the given path. This information will be used by
clients to avoid requesting resolution for components of the path that do
not have a more-specific resolver. If there is a more-specific resolver,
then the client proceeds with the process of requesting subsequent
components of the path. If there is not a more-specific resolver, then this
first phase of the resolution process is completed.

Clients are expected to make use of caches to retain information about
recently visited name resolvers so that resolution of a path can start from
the most-specific known resolver instead of at the root. 

Once the most-specific resolver is found for a particular path, it returns
the address of a separate terminal resolver to the client. The client then
sends the full path to this terminal resolver. The path scheme defines the
protocol for interacting with the terminal resolver as HTTP.

The result of the terminal resolution may be any document, identified by
Content-type, or it may be a redirection to a URL. The URL may be, for
example, an http URL or another path URN.

Implementation of Resolution
============================

The implementation of the resolution process follows the abstract two-step
process. The first step resolves the name into an IP address and a port
number. The second step involves contacting a server at the IP address
and port number returned by the first step and, using the HTTP protocol,
issuing a GET of the entire URN.

Resolving the name into a server and port number 
+++++++++++++++++++++++++++++++++++++++++++++++++

The resolution of a name into a server and port number is done using
existing DNS capabilities. As an aid for the discussion that follows, the
following partial document tree is used:

                                /
                                |
                                A
                                |
                    --------------------------
                    |                        |
                    B1*                      B2*
                    |                        |
                ----------                   |
                |        |                   |
                C1       C2*                 C
                                             |
                                             D*

The nodes marked with * are server nodes. They have one or more
(IP-address, port) pairs associated with them. 

   /A/B1 serves all documents under /A/B1 except /A/B1/C2 
   /A/B2 serves all documents under /A/B2 execpt /A/B2/C/D 

The resolution process proceeds as follows. 

 1. The entire URN, except the scheme and the final component, is
   converted to a DNS name appended with ".path.urn". For example, 

      path:/A/B2/C1/doc.html is converted to 
      c1.b2.a.path.urn 

 2. Partial-names are built starting with the last three components of
   the DNS name and iteratively adding components. All DNS records
   associated with this partial-name are requested using DNS
   resolvers. 

    o If the TXT record is missing, then the URN does not resolve
      into a server and the URN is assumed to be invalid. 

    o If there is an A record, then this is a server node. The TXT
      record lists sub-nodes not handled by this server. 

       o If none of the sub-nodes listed in the TXT record
         match, then this is the server.

       o Else this implies that there is a DNS entry for the
         sub-node. The matching component is added to the
         partial-name to form a new partial-name and this
         step is repeated. 

    o If there is no A record

       o If no A record has been encountered up to this point,
         the next component of the URN is added to the
         partial-name to form a new partial-name and this
         step repeated. 

       o If at least one A record has been encounted up to
         this point

          o If none of the sub-nodes listed in the TXT
            record match the remaining components of the
            path, then the most recent partial-name that
            had an A record is the server for this name. 

          o Else this implies that there is a DNS entry for
            the sub-node. The matching component is
            added to the partial-name to form a new
            partial-name and this step is repeated. 

   Once the server DNS entry is located, the IP-address(es) are
   extracted from the A record and the associated port number(s)
   extracted from the TXT record. 

To clarify the above algorithm, some examples are presented. The
examples use the partial document tree specified previously. The DNS
entries for this partial tree are: 

                              TXT           A
             a.path.urn     -empty-       -none-
          b1.a.path.urn    c2, port=n    ip-address
       c2.b1.a.path.urn        port=n    ip-address
          b2.a.path.urn   d.c, port=n    ip-address
      d.c.b2.a.path.urn        port=n    ip-address

Example lookups 

   /A/B1/C1/doc.ps

           a.path.urn     no A record
                          repeat with b1.a.path.urn
        b1.a.path.urn     has A record, TXT doesn't have c1
                          this is the server

   /A/B2/C/D/doc.ps

           a.path.urn     no A record
                          repeat with b2.a.path.urn
        b2.a.path.urn     has A record, TXT has d.c
                          repeat with d.c.b2.a.path.urn
    d.c.b2.a.path.urn     has A record
                          this is the server

Alternatively, there could be an entry for c.b2.a.path.urn instead of it
being subsumed in b2.a.path.urn: 

                              TXT           A
             a.path.urn     -empty-       -none-
          b2.a.path.urn    c, port=n    ip-address
        c.b2.a.path.urn    d              -none-
      d.c.b2.a.path.urn       port=n    ip-address

The lookups proceed as 

   /A/B2/C/D/doc.ps

           a.path.urn     no A record
                          repeat with b2.a.path.urn
        b2.a.path.urn     has A record, TXT has c
                          repeat with c.b2.a.path.urn
      c.b2.a.path.urn     no A record, TXT has d
                          repeat with d.c.b2.a.path.urn
    d.c.b2.a.path.urn     has A record
                          this is the server

   /A/B2/C/E/doc.ps

           a.path.urn     no A record
                          repeat with b2.a.path.urn
        b2.a.path.urn     has A record, TXT has c
                          repeat with c.b2.a.path.urn
      c.b2.a.path.urn     no A record, TXT does not have e
                          server at b2.a.path.urn

Locating the Resource
+++++++++++++++++++++

The full path URN is passed to the server using the HTTP protocol as a
GET request. The server must either return a full response (with HTTP
header and response), or a URI-header in HTTP message types 301
(moved permanently) or 302 (moved temporarily). For the redirect
messages, the client should process the URLs normally. 

If the HTTP server returns a full response, the object returned could be
the named object itself, or it might be metadata for the object. In either
case, it would be identified by the Content-type header line. If and when
URC standards are defined, clients that are capable of handling URCs
indicate that in the Accepts header line. For clients that cannot handle
URCs, the server could automatically process the URC to instead return a
URL for the object, or it could return the object itself.

Encoding Syntax
===============

    <path-urn>    ::= "path:" <name>
    <name>        ::= <path> "/" [ <final-part> ]
    <path>        ::= "" | "/" <label> [ <path> ]

    <final-part>  ::= any ascii character except "/"

    <label>       ::= <letter> [ [ <ldh-str> ] <let-dig> ]
    <ldh-str>     ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
    <let-dig-hyp> ::= <let-dig> | "-"
    <let-dig>     ::= <letter> | <digit>
    <letter>      ::= A..Z | a..z
    <digit>       ::= 0..9


Note the <label> is defined using the same rules as the domain name
<label>. RFC 1035, specifies that 

   "... while upper and lower case letters are allowed in domain names,
   no significance is attached to the case. That is, two names with the
   same spelling but different case are to be treated as identical 

   "The labels must follow the rules for ARPANET host names. They
   must start with a letter, end with a letter or digit, and have as interior
   characters only letters, digits, and hyphens. There are also some
   restrictions on the length. Labels must be 63 characters or less." 

This document specifies that <label> have the same rules as the <label>
in RFC 1035. 

Naming Collections
++++++++++++++++++

A prefix of a name may be declared by the corresponding naming authority
as the name of a collection. Such a prefix must end with a final "/". The
behavior of resolving the name of a collection is undefined at this point.

URN Requirements
================

The path scheme meets most of the requirements for Universal Resource
Names, as described in [2]. For each functional requirement, we discuss
how the path scheme is in conformance with it or why it should not be a
consideration. We also discuss conformance to the encoding
requirements.

[These comments regarding the URN requirements themselves should
perhaps be in another document, or in a revision of the URN Requirements
document.] 

Functional Requirements
+++++++++++++++++++++++

 o Global scope: The root of the path name space will be known to all
   clients, and for each node in the hierarchical name space, the
   corresponding resolution service will know all its subnodes. This
   guarantees that any particular path URN will have the same
   meaning for each client.

 o Global uniqueness: Each node in the hierarchical name space
   corresponds to a naming authority that is responsible for
   guaranteeing uniqueness within that portion of the name space, or
   for delegating that responsibity to a sub-authority.

 o Persistence: To help guarentee that path URNs remain useful as
   long as they are needed, the path scheme allows any subtree of the
   name space to be served at any net location, and this location may
   be changed without having to change names. But there will always
   exist names that no one wants to continue to support indefinitely.

 o Scalability: Assignment of path names is scalable for an arbitrarily
   large number of documents because the assignment process is
   distributed across an arbitrarily large number of naming authorities.
   The name resolution process is also scalable for any number of
   documents and clients, as discussed below under "Resolution".
   Each naming authority and resolution service need know about only
   a small number of neighboring authorities and services.

 o Legacy support: The path URN scheme does not itself support
   existing legacy naming schemes, but it permits them to be
   supported outside of the path scheme via the extensible, generic
   URL scheme.

 o Extensibility: New URN schemes may be supported outside of the
   path scheme via the extensible, generic URL scheme.

 o Independence: Every path naming authority is constrained by the
   requirements of the path scheme (e.g. components of the path must
   follow the encoding rules), but control of whether a naming authority
   issues a conforming name in its name space is up to that authority
   alone. 

 o Resolution: The path scheme facilitates efficient resolution of path
   URNs. The hierarchical nature of the name space allows clients to
   use caches of remote resolution server locations, so clients rarely
   need to query servers near the top of the hierarchy. For additional
   scalability, a server may delegate resolution of parts of its name
   space to other servers, and clients would then bypass contacting
   the original server.

There is an implied assumption in the URN requirements document that
names resolve into locations as opposed to the documents themselves.
This assumption is predicated on the need for independence from static
location, which we agree with. However, a path name is actually a dynamic
location since the resolution process always finds the current location of
the resolvers along the path. So there is no need to impose the additional
indirection of a map from names to locations solely for the purpose of
finding the current location. There are other advantages of indirection,
however. 

Instead, the path scheme permits different types of documents to be
returned from the resolution process, identified by Content-types as
defined by the HTTP protocol, or locations may be returned via Redirect
commands. 

Encoding Requirements
+++++++++++++++++++++

The encoding syntax for path URNs conforms to the requirements for
generic URLs. Since we intend paths to be used as URNs, the encoding
syntax must also conform to the encoding requirements of URNs. 

The encoding requirements for URNs are met by the path scheme except
potentially for the simple comparison requirement. The path scheme may
be used in such a way that a single resource has only one path name, and
this constraint would be consistent with the simple comparison
requirement. But this requirement does not specify the intended meaning of
a comparison. The intention might be that if two URNs are compared,
inequality implies that the two resources named by the URNs must
necessarily be different. On the other hand, the comparison might be
intended only to find out if the names themselves are supposed to be
equivalent, modulo variation in character sets and whitespace. 

In general, we must allow that a single resource may have multiple names
by different naming schemes. So the simple comparison requirement
cannot be met across multiple naming schemes. Is there sufficient
advantage for the constraint that a resource have only one name per
naming scheme? Tools (such as browsers and caches) should be made
to work with the knowledge that resources do not necessarily have a
single name, by perhaps remembering the canonical name for a resource
in addition to its alternative names. 

References
==========

 1. Berners-Lee, T., Masinter, L., McCahill, M. (editors), "Uniform
   Resource Locators (URL)", RFC 1738, December 1994.
   ftp://ds.internic.net/rfc/rfc1738.txt 

 2. Sollins, K., Masinter, L. "Functional Requirements for Uniform
   Resource Names", RFC 1737, December 1994.
   ftp://ds.internic.net/rfc/rfc1737.txt 

 3. Mockapetris, P., "Domain Names - Implementation and
   Specification", RFC 1035, November 1987.
   ftp://ds.internic.net/rfc/rfc1035.txt 

 4. Fielding, R., HTTP 

Author Contact Information
==========================

Daniel LaLiberte
National Center for Supercomputing Applications
152 Computing Appliations Building
605 East Springfield Avenue
Champaign, IL 61820
Tel: (217) 244-0013
liberte@ncsa.uiuc.edu  

Michael Shapiro
National Center for Supercomputing Applications
152 Computing Appliations Building
605 East Springfield Avenue
Champaign, IL 61820
Tel: (217) 244-6642
mshapiro@ncsa.uiuc.edu
Received on Friday, 17 March 1995 18:02:06 UTC