Re: draft-ietf-url-process-00

Tim Berners-Lee (timbl@w3.org)
Tue, 21 Jan 1997 16:53:24 -0500


Message-Id: <3.0.32.19970121165321.007c74b0@hq.lcs.mit.edu>
Date: Tue, 21 Jan 1997 16:53:24 -0500
To: Larry Masinter <masinter@parc.xerox.com>, internet-drafts@ietf.org
From: Tim Berners-Lee <timbl@w3.org>
Subject: Re: draft-ietf-url-process-00
Cc: uri@bunyip.com

I'm not sure whether this went out when I first wrote it...

At 11:41 am 04-01-97 PST, Larry Masinter wrote:
>[apologies if this is a repeat, I couldn't tell from the bounce
>message whether the original went through.]
>
>This is intended as the initial submission for the newly forming URL
>working group; if you need to, call it
>    draft-masinter-url-process-00
>     
>================================================================
>   
>INTERNET-DRAFT                                    Larry Masinter
><draft-ietf-url-process-00>                          Dan Zigmond
>January 4, 1997                             Harald T. Alvestrand
>expires June 4, 1997
>
>               Guidelines and Process for new URL Schemes
>
>Status of this Memo
>
>   This document is an Internet-Draft.  Internet-Drafts are working
>   documents of the Internet Engineering Task Force (IETF), its areas,
>   and its working groups.  Note that other groups may also distribute
>   working documents as Internet-Drafts.
>
>   Internet-Drafts are draft documents valid for a maximum of six
>   months and may be updated, replaced, or obsoleted by other
>   documents at any time.  It is inappropriate to use Internet-Drafts
>   as reference material or to cite them other than as ``work in
>   progress.''
>
>   To learn the current status of any Internet-Draft, please check the
>   ``1id-abstracts.txt'' listing contained in the Internet-Drafts
>   Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net
>   (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East
>   Coast), or ftp.isi.edu (US West Coast).
>
>   Issues:
>	Registration process isn't really there.
>
>Abstract
>
>   A Uniform Resource Locator (URL) is a compact string representation
>   of the location for a resource that is available via the Internet.
>   [RFC URL-SYNTAX] defines the general syntax and semantics of URLs.
>   This document provides guidelines for the definition of new URL
>   schemes and describes the process by which they are registered.
>
>1. Introduction
>
>   In addition to specifying the general syntax for Uniform Resource
>   Locators, RFC 1738 defined a number of generally useful URL schemes
>   and promised that a mechanism for registering new schemes would be
>   established.  Several new URLs have been proposed since that time,
>   but the procedure for standardizing these schemes has never been
>   fully defined.  This document describes the current practice and
>   offers some guidance for authors of new schemes.
>
>   One process for defining URL schemes is via the Internet standards
>   process: new URL schemes should be described in standards-track
>   RFCs.  The Internet Assigned Numbers Authority (IANA) maintains a
>   registry of all URL schemes defined in this way.
>
>2. Guildelines for new URL schemes
>
>   Because new URL schemes potentially complicate client software, new
>   schemes must have demonstrable utility and operability, as well as
>   compatibility with existing URL schemes. This section elaborates
>   these criteria.
>
>2.1 Syntactic compatibility
>
>   New URL schemes should follow the same syntactic conventions of
>   existing schemes when appropriate. 
>
>2.1.1 Use of initial "//" for Internet host addresses
>
>   Many proposed new URL schemes seem to use "://" as a kind of
>   indicator that what follows is a URL. However, the use of the top
>   level "//" is indicative of an Internet host address, and not a top
>   level marker.

Absolutely *disagree*.  The // is a device of the relative addressing which
originally I intended to be quite general.  The URN scheme should be mapped
onto

  urn1://authority/nameissuedbyauthority

so that (a) relative parsing works and (b) people know that the delimiters
indicate hierarchy, and (c) people are used to it.

A fundamental problem with hte URL draft as is it that it suggests that URL
schemes
must start with FQDNs for relative parsing  to work. Wrong.  It is a purely
syntactic algorithm.


>2.1.2 Compatibility with relative URLs
>
>   URL schemes should use the generic-URL syntax if they are intended
>   to be used with relative URLs.  A description of the allowed
>   relative forms should be included in the scheme's definition.
>   Many applications use relative URLs extensively.
>
>   o Can it be parsed according to RFC URL-SYNTAX - that is, if the tokens
>     "//", "/", ";", "?" and "#" are used, do they have the meaning
>     given in RFC URL-SYNTAX?

	// and / and ; should be subject to relative URL parsing
	which *has* to be independent of scheme.

	# is applied after the retrieval of an object. For example,
	whatever URI (URL or URN or whatever)  xxx you use for
	and HTML document,  xxx#foo will always refer to the
	SGML element with ID foo.  That is a property of the
	text/html mime type, (and ought to be in the MIME type
	registry).  No UR* scheme can messw ithit or with #.

>   o Does it make sense to use it in relative URLs like those RFC
>     URL-SYNTAX specifies?


>   o If something is designed to be broken into pieces, does it
>     document what those pieces are, why it should be broken in this
>     way, and why the breaks aren't where URL-SYNTAX says that they
>     usually should be?

If there are hierarchical breaks and they aren't represented by / and //
then the identifier scheme shall not be registered.

>   o If it has a hierarchy, does it go left-to-right and with slash
>     separators like RFC URL-SYNTAX? If not, why not?

Why not is not enough. I would say it has to.  You can't retrospectively
update a whole load of parsers.  You can add gateways to existing
systems.


>2.1.3 Does it start with "ur"?
>
>   Any scheme starting with the letters "U" and "R", in particular if
>   it attaches any of the meanings "uniform", "universal" or
>   "unifying" to the first leter, is going to cause intense debate,
>   and generate much heat (but maybe little light).

Therefore, the scheme name itself should not start with UR.
In particular, this will prevent the first group from trying
to corner the "only" URN scheme by calling it URN:.

>2.2 Is the scheme well defined?
>
>   It is important that the semantics of the "resource" that a URL
>   "locates" be well defined. This might mean different things
>   depending on the nature of the URL scheme.

Let's say that the issues of identity, responsibility for reliability,
reuse, etc (see RFC1630) which make schemes different should
be addressed in the spec.

>2.2.1 Clear mapping from other name spaces
>
>   In many cases, new URL schemes are defined as ways to translate
>   other protocols and name spaces into the general framework of
>   URLs. The "ftp" URL scheme translates from the FTP protocol, while
>   the "mid" URL scheme translates from the Message-ID field of
>   messages.
>
>   In either case, the description of the mapping must be complete,
>   must describe how character sets get encoded or not in URLs, must
>   describe exactly how all legal values of the base standard can be
>   represented using the URL scheme, and exactly which modifiers,
>   alternate forms and other artifacts from the base standards are
>   included or not included.
>
>   In all cases, encoding rules must be made clear: What octets are
>   put into the URL, and if other octets need to be represented, what
>   convention is used to represent them?  Any departure from the %xx
>   convention needs special justification.
>
>2.2.2 URL schemes associated with network protocols
>
>   Most new URL schemes are associated with network resources that
>   have one or several network protocols that can access them. The
>   'ftp', 'news', and 'http' schemes are of this nature. 

Except that "news" was not called  "nntp" because there are many ways of
getting news.

> For such
>   schemes, the specification should completely describe how URLs are
>   translated into protocol actions in sufficient detail to make the
>   access of the network resource unambiguous.  If an implementation
>   of of the URL scheme requires some configuration, the configuration
>   elements must be clearly identified. (For example, the 'news'
>   scheme, if implemented using NTTP, requires configuration of the
>   NTTP server.)
>
>2.2.3 Definition of non-protocol URL schemes
>   
>   In some cases, URL schemes do not have particular network protocols
>   associated with them, because their use is limited to contexts
>   where the access method is understood. This is the case, for
>   example, with the "cid" and "mid" URL schemes. For these URL
>   schemes, the specification should describe the notation of the
>   scheme and a complete mapping of the locator from its source.
>   
>2.2.4 Definition of URL schemes not associated with data resources
>
>   Most URL schemes locate Internet resources that correspond
>   to data objects that can be retrieved or modified. This is the
>   case with "ftp" and "http", for example. However, some URL schemes
>   do not; for example, the "mailto" URL scheme corresponds to an
>   Internet mail address.
>   
>   If a new URL scheme does not locate resources that are data
>   objects, the properties of names in the new space must be clearly
>   defined.
>
>2.2.5 Definition of operations
>
>   In some contexts (for example, HTML forms) it is possible to
>   specify any one of a list of operations to be performed on a
>   specifc URL. (Outside forms, it is generally assumed to be
>   something you GET.)
>
>   The URL scheme definition should describe all well-defined
>   operations on the URL identifier, and what they are supposed to
>   do.
>        
>   Some URL schemes (for example, "telnet") provide location
>   information for hooking onto bidirectional data streams, and don't
>   fit the "infoaccess" paradigm of most URLs very well; this should
>   be documented.
>
>   NOTE: It is perfectly valid to say that "no operation apart from
>   GET is defined for this URL". It is also valid to say that "there's
>   only one operation defined for this URL, and it's not very
>   GET-like". The important point is that what is defined on this type
>   is described.
>
>2.3 Demonstrated utility
>
>   URL schemes should have demonstrated utility.  New URL schemes are
>   expensive things to support. Often they require special code in
>   browsers, proxies, and/or servers.  Having a lot of ways to say the
>   same thing needless complicates these programs without adding value
>   to the Internet.
>
>   The kinds of things that are useful include:
>
>      o Things that cannot be referred to in any other way.
>
>      o Things where it is much easier to get at them using this scheme than
>        (for instance) a proxy gateway.
>
>
>2.3.1 Proxy into HTTP/HTML
>
>   One way to provide a demonstration of utility is via a gateway
>   which provides objects in the new scheme for clients using an
>   existing protocol. It is much easier to deploy gateways to a new
>   service than it is to deploy browsers that understand the new URL
>   object.
>
>   Things to look for when thinking about a proxy are:
>
>   o Is there a single global resolution mechanism whereby any proxy can
>     find the referenced object?
>   o If not, is there a way in which the user can find any object of this
>     type, and "run his own proxy"?
>   o Are the operations mappable one-to-one (or possibly using
>     modifiers) to HTTP operations?
>   o Is the type of returned objects well defined?
>      * as MIME content-types?
>      * as something that can be translated to HTML?
>   o Is there running code for a proxy?
>
>2.4 Are there security considerations?
>
>   Above and beyond the security considerations of the base mechanism
>   a scheme builds upon, one must think of things that can happen in
>   the normal course of URL usage.
>
>   In particular:
>
>   o Does the user need to be warned that such a thing is happening
>     without an explicit request (GET for the source of an IMG tag,
>     for instance)?  This has implications for the design of a proxy
>     gateway, of course.
>
>   o Is it possible to fake URLs of this type that point to different
>     things in a dangerous way?
>
>   o Are there mechanisms for identifying the requester that can be
>     used or need to be used with this mechanism (the From: field in a
>     mailto: URL, or the Kerberos login required for AFS access in the
>     AFS: url, for instance)?
>
>   o Does the mechanism contain passwords or other security
>     information that are passed inside the referring document in the
>     clear (as in the "ftp" URL, for instance)?
>
>2.7 Does it start with UR?
>
>Any scheme starting with the letters "U" and "R", in particular if it
>attaches any of the meanings "uniform", "universal" or "unifying" to the
>first leter, is going to cause intense debate, and generate much heat (but
>maybe little light).
>
>Any such proposal should either make sure that there is a large consensus
>behind it that it will be the only scheme of its type, or pick another
>name.
>
>2.5 Non-considerations
>
>   Some issues that are often raised but are not relevent to new URL
>   schemes include the following.
>
>2.5.1 Is it an URL, an URN or something else?
>
>   This classification has proved interesting in theory, but not
>   terribly useful when evaluating schemes.
>
>2.5.2 Are all objects acessible?
>
>   Can all objects in the world that are validly identified by a
>   scheme be accessed by any UA implementing it?
>
>   Sometimes the answer will be yes and sometimes no; often it will
>   depend on factors (like firewalls or client configuration) not
>   directly related to the scheme itself.
>
>3. Revision process
>
>NOTE: THIS SECTION IS ENTIRELY TBD. REVIEW COMMITTEE? PRIVATE URLS?
>   
>   URL schemes will have either a standards track RFC, or else they
>   will be a registration at IANA. where include the whole draft.  URL
>   schemes will have a review panel, appointed by IETF AD, who may not
>   reject a URL scheme but who may provide a 2 sentence recommendation
>   about the use of the URL scheme.  Conflicting registrations are
>   possible for non-standard URL schemes, and the order in the IANA
>   list of conflicting registrations will be determined by a random
>   number generator. 
>
>4. Security considerations
>
>   New URL schemes are required to address all security considerations
>   in their definitions.
>
>5. IANA considerations
>
>   This document requires IANA to register URL schemes according to
>   the process outlined in section 3.
>
>6. References
>
> [URL-SYNTAX]
>    Berners-Lee, Fielding, Masinter, "Uniform Resource Locators",
>    <draft-fielding-url-syntax-03>, will be RFC.
>
>7..Author's Addresses
>
>   Larry Masinter
>   Xerox Corporation
>   Palo Alto Research Center
>   3333 Coyote Hill Road
>   Palo Alto, CA 94304
>   Fax: +1-415-812-4333
>   EMail: masinter@parc.xerox.com
>
>   Dan Zigmond
>   Wink Communications
>   1001 Marina Village Parkway
>   Alameda CA 94610
>   Fax: +1-510-337-2960
>   Phone: +1-510-337-6359
>   Email: dan.zigmond@wink.com
>
>   Harald T. Alvestrand
>   UNINETT A/S
>   Postboks 6683 Elgeseter 7002
>   Trondheim, Norway
>   Tel: +47 73 59 70 94
>   EMail: Harald.T.Alvestrand@uninett.no
>
>