- From: Paul Hoffman <ietf-lists@proper.com>
- Date: Tue, 21 Mar 1995 08:32:49 -0700
- To: uri@bunyip.com
- Cc: internet-drafts@cnri.reston.va.us
IETF URI Working Group Internet-Draft draft-ietf-uri-yaurn-00.txt Expires September 21, 1995 Uniform Resource Names (URNs) Status of this memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months. Internet-Drafts may be updated, replaced, or obsoleted by other documents at any time. It s not appropriate to use Internet- Drafts as reference material or to cite them other than as a "working draft" or "work in progress." To learn the current status of any Internet-Draft, please check the 1id-abstracts.txt listing contained n the Internet-Drafts Shadow Directories on ds.internic.net, nic.nordu.net, ftp.isi.edu or munnari.oz.au. Abstract This document defines a syntax for Uniform Resource Names (URNs), describes the protocols by which they are resolved, and gives operational rules for their assignment and use. The proposal meets all of the requirements for URNs previously proposed in the URI working group. 1. Introduction A URN (Uniform Resource Name) is the name of a resource within the context of a larger Internet information architecture known as Uniform Resource Identification. URNs are simple text strings. Using a URN resolution service, an Internet user or program can retrieve information about the named resource. Resolving a URN returns information about that named resource, such as a description of the resource, the location or locations of the resource, and bibliographic information about the resource. URNs differ from URLs (Uniform Resource Locators) described in RFC 1738 [URL] in that URNs allow resources on the Internet to be specified by name instead of by location. URNs return information about the resource, not the resource itself. Using URNs has many advantages for describing resources, including: - The information in a resource may move in the future. For example, as some Internet services get too popular for their original hosts, they move to different systems which have different URLs. Also, a service might move from host system to host system with the person who maintains it. - Users can easily access metainformation about a resource without accessing the resource itself. A user might want to see the bibliographic information about a resource without getting the resource, particularly if it costs money to get the resource. Other useful metainformation that a user might want to see before accessing a resource includes the name of the maintainer of the resource, the language in which the resource is in, the price of the resource, the requirements of the user before reading the resource, and so on. - A resource may exist in many locations on the Internet. By resolving a single URN, the user can get a list of URLs from which to access the resource and more complete bibliographic information from a URC [URC]. This list can include valuable metainformation such as suggestions about the best location for the user to retrieve the resource from based on cost, speed, or other factors. It is important to understand that URNs are neither a superset or a subset of URLs: they are a different way of describing resources. The difference between URNs and URLs is similar to that between the title of a book and a locator for it on the shelf. Many of the ideas for the system proposed in this paper are adapted from the Intenet Draft "draft-ietf-uri-resource-names-03" [URNO]. However, our system has significant implementation differences from its predecessor. 2. Syntax The URN consists of four fields, each part separated by a colon (:). The parts are: - The text string "urn" - The type of naming authority of the URN, called the SchemeID - The name of the authority for the URN, called the AuthorityID - The name of the element for the URN, called the ElementID For instance, a typical URN might look like: <urn:dns:library.bigstate.edu:aj17-mcc> The first and SchemeID fields are case insensitive. The case sensitivity and character set of the AuthorityID field depends on the value of the SchemeID field. The case sensitivity and character set of the ElementID field depends on the value of the AuthorityID field. White space (the characters Space, Tab, CR, and LF) is allowed but is not significant within a URN. Freestanding URNs that include any white space must be enclosed in "<" and ">" characters. URNs are often displayed and transmitted in the same media as URLs. In order to minimize problems with interoperability, encoding of characters in URNs follow the same rules used by URLs in [URL]. 2.1 SchemeIDs The SchemeID field describes what kind of naming authority is used. This is the authority by which the AuthorityIDs are defined. The SchemeIDs currently defined are: SchemeID Description dns Domain Name System (RFC 952 and RFC 1123) [DNS] Although there is only one SchemeID currently defined, it is expected that others will be added in the future. This allows URNs to be used for such naming schemes as ISBNs (International Standard Book Numbering system), ISSNs (International Standard Serial Numbering system), and other naming schemes that are not currently on the Internet. It also allows for naming schemes that can be resolved in ways that are distributed and not location-specific. The Internet Assigned Numbers Authority (IANA) will maintain a registry of SchemeID names. SchemeID names starting with the characters "x-" are reserved for experimental purposes. It is strongly suggested that any new SchemeIDs be first proposed as Internet Drafts in the IETF Uniform Resource Identifier (URI) Working Group. 2.2 AuthorityIDs The AuthorityID is the name of an individual, group, or system within the SchemeID that is allowed to create ElementIDs. The meaning of the AuthorityID depends on the system used by the SchemeID to create names for URNs. 2.2.1 AuthorityIDs for the "dns" SchemeID Within the "dns" SchemeID, the AuthorityID is the fully-qualified domain name (FQDN) of the host system(s) that can create and resolve the ElementID. The owner or maintainer of each domain name has the exclusive right to create, and the exclusive right to resolve or cause to be resolved, ElementIDs specific to that domain name. The case of the characters in the AuthorityIDs is not significant. The owner or maintainer of a FQDN automatically has the right to use of that FQDN as an AuthorityID if it follows the rules in this section. - Any host may resolve URNs without prior registration or authorization. - AuthorityIDs cannot be reused if that FQDN has previously been used as an AuthorityID, unless the new owner or maintainer of that FQDN agrees to maintain all the previously-assigned ElementIDs. - A conforming resolution service must made available at the IP address returned by resolving the FQDN. If resolving the FQDN results in more than one IP address, all the IP addresses must resolve the same set of URNs, and each URN should be resolved equivalently at each IP address when at all possible. - The AuthorityID is to be treated as an opaque string. Inferring a structure based on the domain names within the FQDN is unwise. 2.3 ElementIDs The ElementID is the element that will be resolved. It is important to note that this is not the "name" part of the URN: the combination of the four fields in a URN is the only definition of the resource's name. For example, the URN <urn:dns:physics.bigstate.edu:thesis12> and the URN <urn:dns:chris.lwr-ltd.co.uk:thesis12> might be names of completely different Internet resources, similar resources, or even the same resource. However, even though the ElementIDs are the same in the two URNs, the two URNs are different. The special cases for ElementIDs are: - Elements that begin with exactly "urn+" are reserved for special resolution, as described in section 3.5. - The characters "#" (hex 23) and "?" (hex 3F) are reserved for future use. 2.3.1 ElementIDs for the "dns" SchemeID Within the "dns" SchemeID, the same ElementIDs on different domain names are explicitly unrelated. The case of the characters in ElementIDs can be significant; however, current practice dictates that they should not be case-sensitive. ElementIDs are opaque, meaning that attempting to infer structure from the name is unwise. 3. Resolving URNs A URN is resolved by communicating with a URN resolution service. All URN resolution services are stateless, single-step systems: a single URN is specified, and a single response is returned. The model for URN resolution is the standard Internet connectionless client-server model. URNs are resolved through the HTTP version 1.0 protocol [HTTP]. HTTP is chosen as the resolution protocol because of its current wide use on the Internet, and because it allows for resolution requests and resolution results that are compatible with those required for URNs. Further, the content-negotiation portion of the HTTP protocol provides a mechanism for gracefully handling future capabilites of URN resolvers. The URN resolution may keep up with improvements to the HTTP specification, such as future versions, security enhancements, and so on. 3.1 Mechanics of URN Resolution URN resolution can be described in two parts: the resolution request and the resolution result. A resolution request describes the URN to be resolved to the resolution server. A resolution result is the text that is returned from the resolution server; it gives the requested information about the resource named in the URN. The current query-response system for resolving URNs is quite simple. Future extensions to this system will accommodate more capable query languages. 3.1.1 Resolution Requests Each resolution request must give enough information to the resolution server that it can completely and unambiguously identify which URN is being specified. The resolution request specifies the information desired in the response. This might be simple list of URLs, or a more informative reply with a great deal of additional information. HTTP's Accept: header is used to indicate the desired format for the response, such as plain text, HTML, or some application specific binary encoding suitable for cryptographic operations. 3.1.2 Resolution Results The resolution server returns the results of the resolution request in a single message. If the resolution request specifies desired formats for the response, the resolution server should attempt to return only those types of results. However, it is acknowledged that this may be impractical or impossible for some HTTP servers, and the client should be able to handle (or at least ignore) resolution results that are not of a requested type. 3.2 Resolving URNs Note that this section uses language defined in the HTTP version 1.0 draft. Currently, the URN-resolving HTTP server must be located at TCP port 4500. After registration with the IANA, the port number required will change to an IANA-reserved port number. Because of the requirements in section 5.4 of the current HTTP 1.0 specification, the resolution request under HTTP is in the form of a URL. It is expected that this URL will later be defined within [URL]. The anticipated format for the client-side representation of the URL is the same as that of the URN itself. If the HTTP version 1.0 specification changes to allow non-URI forms for requests, no URL for URNs is required. 3.3 Resolution Requests When passed to the HTTP URN resolver, the first three fields of the URN are stripped off, leaving only the ElementID. The Method is always "GET". Thus, a typical request might look like: GET current-version-of-price-list HTTP/1.0 URN HTTP clients can send URN resolution requests as HTTP Simple-Requests or Full-Requests. It is strongly urged that clients use the Full-Request format so that General-Headers and Request-Headers can be passed. Although not required, a good resolution request creator should be able to include the Accept Request-Header. The Accept header is used to specify the format of information desired by the user, if any. If the URN resolution client has no preference, no Accept header should be used. For example, a request where the user only wanted to see results in "urc-0" format might look like: GET alb002 HTTP/1.0 Accept: text/urc-0 It is urged that URN resolvers should also be able to interpret and act on all standard HTTP Request-Headers. 3.4 Resolution Results Formats for resolution results are described as Internet Media Types [IMT], previously called MIME Content-Types. Known formats for resolution results are described below. The following formats for resolution results are defined: text/urc-0 text/plain text/html It is anticipated that other formats of URCs that include structured bibliographic information will be defined in the future. Other formats can be defined by using existing, or creating new, Internet Media Types. Formats can use any media type, although it is expected that most will use type "text" or "application". 3.4.1 HTTP Status-Line The HTTP Status-Line is significant in URN resolution. The meaning of the Status-Codes for HTTP correspond to similar meanings for URN resolution, such as "200" for "OK", "301" for "Moved permanently", and so on. When at all possible, the URN resolver should include relevant Reason-Phrases in the Status-Lines, particularly for Status-Codes 301 and 302 (redirection). 3.4.2 Format for text/urc-0 Resolution Results text/urc-0 is a structured format that can be easily parsed by programs, and can also be visually parsed by humans. Thus, resolution servers should prefer this format when possible. The format for urc-0 resolution results is: *[<header><CRLF><some-URL><CRLF>[<metainformation>]] Each part starts with a header that has the following form: =====[<charset>[/<language>]] <charset> is the character set used in the metainformation. The value for the field is one of "US-ASCII" or "ISO-8859-x", where "x" is a digit in the range "1" through "9". If not specified, the default is "US-ASCII". <language> is the language used in the metainformation. The value for the field is a language identification tag described in [LANG]. If not specified, the default is "x-unspecified". The returned URL must conform to [URL]. If the URL is more than one line long, it must begin with the characters "<URL:" and end with a ">" character, as described in [URL]. The optional metainformation may be of any format and contain any text. The only restriction is that no line of the metainformation may begin with the characters "=====". 3.4.2.1 Examples of urc-0 Resolution Results A resolution result that has a single URL and no metainformation might look like: ===== ftp://elm.wnln.edu/pub/mirrors/phone-list.txt A resolution result that has a URL on multiple lines might look like: ===== <URL:ftp://elm.wnln.edu/pub/mirrors /phone-list.txt> A resolution result that has a multiple URLs might look like: ===== ftp://elm.wnln.edu/pub/mirrors/phone-list.txt ===== ftp://gagu.bigstate.edu/admin/phones.html A resolution result that has a multiple URLs with metainformation might look like: =====US-ASCII ftp://elm.wnln.edu/pub/mirrors/phone-list.txt This is the most up-to-date version of the WNLN-Bigstate phone list. It is maintained by Cheryl O'Donnell. =====US-ASCII ftp://gagu.bigstate.edu/admin/phones.html This is the mirror of the first URL at Bigstate. 3.4.3 Format for text/html Resolution Results text/html can be used when the resolution result is meant to only be read by users with HTML clients. The structure is text formatted with HTML tags, as described in [HTML]. text/html allows the URLs to be displayed as active links, but it is anticipated that HTML clients in the future will parse text/urc-0 and automatically display it as HTML with active links. 3.4.4 Format for text/plain Resolution Results text/plain can be used when the resolution result is meant to only be read by humans. There is no structure implied in the format. Because it is not easily parsed by client programs, it should only be used when it is impossible to use other formats. Both text/plain and text/html are intended to provide URN resolution capabilities to current software. This backward compatibility should ease the transition to a URN-based web. 3.5 Reserved Requests As mentioned before, ElementIDs beginning with "urn+" are reserved. The following requests, and their responses, are defined. The responses to the reserved requests may be in any of the known formats (currently text/plain, text/html, and text/urc-0). 3.5.1 urn+m: Meta-information About the Resolver (required) Returns meta-information about the resolver, such as who to contact with questions, the software it is running, and so on. There is no structure to the metainformation, but it is considered authoritative for each resolving host; it is likely that this will be a mailto URL of an administrative contact for the host system. This ElementID must be served by all conforming resolvers. For example: =====US-ASCII/en mailto:url-admin@flixco.com The URNs on this server mostly point to movies created by FlixCo. Other URNs are pointers to affiliated libraries of classic films. We are currently resolving URNs with CERN httpd 3.4a. 3.5.2 urn+a: List of All ElementIDs (optional) Returns a pointer to a list of all ElementIDs that can be resolved by this resolver. This ElementID is optional, but may be of great value to resolution clients. The response includes a URL, where the URL points to the list. That list consists of lines, each line an ElementID followed by <CRLF>. The response may also be augmented with metainformation on the number of elements and so on. For example, a text/urc-0 response might be: =====US-ASCII/en ftp://ftp.flixco.com/pub/all-urns This file is a list of all URNs resolvable at urn.flixco.com. The file is about 50K, and is sorted with the most-recently added URNs at the end. 3.5.3 urn+c: List of Child Naming Authorities (required) Returns a list of the naming authorities authorized by this one. Each name in the list is followed by <CRLF>. This local hierarchy information will make it possible, in principle, to make complete traversals of the web of URN resolvers for some SchemeIDs. 3.5.4 urn+p: Name of Parent Naming Authority (required) Returns the name of the naming authority that authorized the naming authority on this resolver. If a naming authority has more than one parent, a list of names is returned. This is a local hierarchy operation that makes it possible, in principle, to perform complete traversals of the web of URN resolvers for some SchemeIDs. 4. URN Resolution Clients, Proxies, and Gateways URN clients can interpret the resolution result and present the user with a better interface than just the text that is returned. For example, an intelligent Gopher-based URN client going through a Gopher-to-HTTP gateway can select the URLs with the "gopher:" scheme and create a Gopher menu of them. Similarly, an intelligent HTML-based URN client can reformat a resolution result that has a format of text/urc-0 as HTML text with the URLs as links that can be selected. In order to make URN resolution available to as many Internet users as possible, it is assumed that resolution may take place through HTTP proxies and gateways. Proxies and gateways allow Internet users who do not have URN or HTTP clients to resolve URNs. Current HTTP proxies and gateways should work well for this purpose, but they should be enhanced and more widely available. Some sites will choose to have local proxies and gateways, while other sites will allow people from outside their site to use their proxies and gateways freely. URN resolution client, proxies, and gateways should intelligently follow redirection as described in this specification. For example, if a URN resolver returns a Status-Line with Status-Codes 301 or 302, and that line contains a URN, the client, proxy, or gateway should attempt to resolve the original URN by substituting the new URN. 5. Caching URN resolution clients, proxies, and gateways may choose to cache resolution results in order to speed resolution and to reduce Internet traffic. Caching should only be performed using the If-Modified-Since mechanism in HTTP so that URNs whose response content changes rapidly are not accidentally cached. 6. Meeting Requirements for URNs This proposal meets all the requirements stated in working draft for URN requirements. The basic requirements from that document are: 6A. Function capabilities Global scope: Each SchemeID will define a set of AuthorityIDs that are global in scope. Global uniqueness: The combination of SchemeID, AuthorityID, and ElementID will always be unique. Persistence: It is easy for publishers to have their URNs last forever by allowing them to hand over the resolution to one or more hosts in the future. Even if the resolution of a particular URN no longer makes any sense, it is easy to fully resolve the URN to something that is readable by the user, and to do this forever. Scalability: For the "dns" scheme, if a site gets too busy, mirror sites can be specified using standard DNS procedures. Legacy support: Current naming systems can be easily incorporated by giving them their own SchemeIDs. Experimental SchemeIDs can be created with the "X-" scheme. Extensibility: All parts of the specification are designed to be extensible. Additional resolution requests and resolution results can be defined, and so on. Independence: All SchemeIDs will be able to specify their own names, restricted only by the encoding rules of [URL]. Resolution: The resolver uses a simple HTTP exchange, supported by dozens of browsers and servers today. Further, there are already email-to-HTTP gateways, allowing Internet users with only email access to resolve URNs immediately. 6B. URN encoding Single encoding: URNs have an encoding that is independent of the resolution protocol. If additional resolution protocols are added, the encoding of the URNs does not change; instead, the resolution clients, proxies, and gateways change the request to fit the protocols. Simple comparison: All URNs are unique and therefore easily compared (after appropriate decoding and case translation). Human transcribability: URNs can be transcribed as easily as URLs. Transport friendliness: Like URLs, all characters that should affect Internet transport are encoded. Machine consumption: URNs and their results are easily parsed. Text recognition: URNs will stand out in free text due to the internal colons and the "urn" prefix. 6C. Implications Uniqueness: Name assignment is delegated to naming authorities, who may then assign names. Scalability: The DNS naming authorities are scalable without any notification to, or approval from, a central authority. Additional SchemeIDs may or may not be scalable, depending on the wishes of the central authorities of the scheme. However, given the ease of becoming the owner or maintainer of a FQDN, almost anyone should be able to become a URN publisher with less difficulty than they have setting up most other Internet services. Mapping to URLs: URLs can be returned in any in any of the resolution results format types. Transcriptability: The character set for URNs is simple and small, the same as URLs. 7. Security Implications Attempting to resolve a URN can cause unencoded messages to be sent between two systems on the Internet, and thus can introduce many security concerns. Resolution requests and responses can be logged at the originating site, the recipient site, and intermediary sites along the delivery path. Resolution requests and responses can also be read at any of those sites. All the security implications of HTTP are probably the same for URNs. Security features that are added to HTTP will probably increase the security of resolving URNs. 8. References [DNS] RFC 952, "DOD Internet Host Table Specification" and RFC 1123, "Requirements for Internet Hosts -- Application and Support". [HTML] Internet-Draft, "HyperText Markup Language Specification - 2.0". The name of the draft at the time of this writing is "draft-ietf-html-spec-01.txt". [HTTP] Internet-Draft, "Hypertext Transfer Protocol -- HTTP/1.0". The name of the draft at the time of this writing is "draft-ietf-http-v10-spec-00.txt". [IMT] RFC 1590 "Media Type Registration Procedure". [LANG] Internet-Draft, "Tags for the identification of languages". The name of the draft at the time of this writing is "draft-mailext-language-tag-03.txt". [URC] Internet-Draft, "URC Scenarios and Requirements". The name of the draft at the time of this writing is "draft-ietf-uri-urc-req-00.txt". [URL] RFC 1738, "Uniform Resource Locators (URL)". [URNO] Internet-Draft, "Uniform Resource Names (URN)" by Mitra, Chris Weider, and Mike Mealling. The name of the draft at the time of this writing is "draft-ietf-uri-resource-names-03". 9. Author Contact Information Paul E. Hoffman Proper Publishing 127 Segre Place Santa Cruz, CA, USA 95060 voice: (408) 426-6222 phoffman@proper.com Ron Daniel Jr. MS B287 Los Alamos National Laboratory Los Alamos, NM, USA 87545 voice: (505) 665-0139 fax: (505) 665-4939 rdaniel@lanl.gov
Received on Tuesday, 21 March 1995 11:31:31 UTC