W3C home > Mailing lists > Public > uri@w3.org > January 2005

RE: comments on draft-hansen-2717bis-2718bis-uri-guidelines-02

From: Larry Masinter <LMM@acm.org>
Date: Sun, 23 Jan 2005 18:36:52 -0800
To: "'Roy T. Fielding'" <fielding@gbiv.com>
Cc: "'uri'" <uri@w3.org>
Message-id: <0IAS00CJHV9GPU@mailsj-v1.corp.adobe.com>

I realize I hadn't replied to Roy's comments, even though I accept
many of his edits:

> The abstract should simply state what the document is and should spell
> out the acronym, as in
> 
>     This document provides guidelines, recommendations, and a mechanism
>     for the definition and registration of Uniform Resource Identifier
(URI)
>     schemes.


Accept.

> and the second sentence ["The registration requirements have been
simplified
> by providing for provisional registrations that need no technical review
> and may share names with existing scheme names.] should be deleted because
> it makes no sense outside current discussion.

Perhaps we should have an appendix that describes the difference between
the process in this document and previous processes, and move all of the
discussion about RFC 2717 and RFC 2718. But I agree this sentence could
be removed from this context.


> 1.  Introduction
> 
>     A Uniform Resource Identifier (URI) is a compact string
>     representation for identifying resources.  RFC XXXX [6] defines the
>     general syntax of URIs.
> 
> Whoa, we don't want to spend another three years arguing over the precise
> wording of the definition of URIs, do we?  So, don't try to rephrase the
> definition (incorrectly) here.  Start with
> 
>     The Uniform Resource Identifier (URI) protocol element and generic
syntax
>     is defined by RFC XXXX [6].  Each URI begins with a scheme name, as
defined
>     in Section 3.1 of RFC XXXX, that refers to a specification for
assigning
>     identifiers within that scheme.  As such, the URI syntax is a
federated and
>     extensible naming system wherein each scheme's specification may
further
>     restrict the syntax and semantics of identifiers using that scheme.

>     This document provides guidelines for the definition of new URI
schemes,
>     for consideration by those who are defining, registering, or
evaluating
>     those definitions, as well as a process and mechanism for registering
URI
>     schemes within the IANA URI scheme registry [ref].  This document
obsoletes
>     both RFCs 2717 [2] and 2718 [3].

Agree with the edit.

> ======
> 
>     The original terminology for the URI protocol element attempted to 
> ...
> 
> Er, that terminology was added two years after the original URI.  Try
> 
>     RFCs 2717 and 2718 draw a distinction between 'locators' --
identifiers
>     used for accessing resources available on the Internet, and 'names' 
> --
>     identifiers used for naming possibly abstract resources, independent
>     of any mechanism for accessing them.  The intent was to use the
>     designation "URL" (Uniform Resource Locator) for those identifiers
>     that were locators, and "URN" (Uniform Resource Name) for those
>     identifiers that were names.  In practice, the line between 'locator'
>     and 'name' has been difficult to draw: locators can be used as names,
>     and names can be used as locators.

Agree with the edit.

> ====
> 
>     As a result, recent documents have used the term "URI" for all
>     resource identifiers, avoiding the term "URL", and reserving the term
>     "URN" explicitly for those URIs using the "urn" scheme name (RFC 2141
>     [1]).  URNs remain a distinct class of URIs because of the
>     requirements set out in RFC  3406 [8]; this document's procedures do
>     not update or supersede the procedures set out in RFC 3406.
> 
> Stick to the facts, please -- the only thing distinct about URNs is
> the scheme name and that scheme will be in this registry.  How about
> 
>     As a result, recent documents have used the term "URI" for all
>     resource identifiers, avoiding the term "URL", and reserving the term
>     "URN" explicitly for those URIs using the "urn" scheme name (RFC 2141
>     [1]).  URN "namespaces" (RFC 3406 [8]) are specific to the "urn"
scheme
>     and outside the scope of this document.


Agree with the edit.

> 
> ====
> 
>     RFC 2717 defined a set of registration trees in which URI schemes
>     could be registered, one of which was called the IETF Tree, to be
>     managed by IANA.  RFC 2717 proposed that additional registration
>     trees might be approved by the IESG, however, no such registration
>     trees have been approved.
> 
> should be "... the IESG.  However, ...".
> 
>     This document eliminates RFC 2717's distinction between different
>     'trees' for URI schemes; instead there is a single namespace for
>     registered values.  Within that namespace, there are values that are
>     approved as meeting a set of criteria for URI schemes.  Other scheme
>     names may also be registered provisionally or without necessarily
>     passing any review process or criteria.
> 
> should be "... provisionally, without necessarily ..."

Perhaps this should be moved to the appendix outlining the differences
with RFC 2717 / 2718?

> ====
> 
> 2.2  Syntactic compatibility
> 
>     RFC XXXX [6] defines a generic syntax for URI schemes with
>     hierarchical components and a naming authority.  New URI schemes
>     should follow this syntax.
> 
> Not strong enough.  It should say
> 
>     RFC XXXX [6] defines the generic syntax for all URI schemes,
>     along with the syntax of common URI components that are used by many
>     URI schemes to define hierarchical identifiers.  All URI scheme
>     specifications must define their own syntax such that all strings
>     matching their scheme-specific syntax will also match the
<absolute-URI>
>     grammar described in Section 4.3 of RFC XXXX.
> 
>     New URI schemes should reuse the common URI components of RFC XXXX
>     for the definition of hierarchical naming schemes.  However, if there
>     is a strong reason for a URI scheme to not use the hierarchical
syntax,
>     then the new scheme definition should at least follow the syntax of
>     previously registered schemes, if possible.
> 
>     URI schemes that are not intended for use with relative URIs should
>     avoid use of the forward slash "/" character, which is used for
>     hierarchical delimiters, and the complete path segments "." and ".."
>     (dot-segments).
> 
>     Avoid improper use of "//".  The use of double slashes in the first
>     part of a URI is not an artistic indicator that what follows
>     is a URI: Double slashes are used ONLY when the syntax of the URI's
>     <scheme-specific-part> contains a hierarchical structure as described
>     in RFC XXXX.  In URIs from such schemes, the use of double slashes
>     indicates that what follows is the top hierarchical element for a
>     naming authority.  (See section ???? of RFC XXXX for more details.)
>     URI schemes that do not contain a conformant hierarchical structure
>     in their <scheme-specific-part> should not use double slashes
>     following the "<scheme>:" string.

I don't disagree with anything that you've written here, and I suppose
it's useful to elaborate what it means to "follow" RFC2396bis. Is there
any problem with duplicating this information?

> ====
> 
> 2.3  Well-Defined
> 
> ...
> 
>     In many cases, new URI schemes are defined as ways to translate other
>     protocols and name spaces into the general framework of URIs.  For
>     example, the "ftp" URI scheme translates from the FTP protocol, while
>     the "mid" URI scheme translates from the Message-ID field of
>     messages.  For such schemes, the description of the mapping must be
>     complete, must describe how characters get encoded or not in URIs,
>     must describe exactly how all legal values of the base standard can
>     be represented using the URI scheme, and exactly which modifiers,
>     alternate forms and other artifacts from the base standards are
>     included or not included.  These requirements are elaborated below.
> 
> While that description is appealing, it is also wrong.  In fact, the "ftp"
> URI scheme does not "translate" from the FTP protocol -- what it does
> is map identifiers to the specific interface of an FTP/TCP/IP server.
> FTP (the base standard) is far more capable than the limited set of
> resources that can be identified via the "ftp" URI.  In fact, the
> paragraph following it is more accurate for all locator schemes:
> 
>     In some cases, URI schemes do not have particular network protocols
>     associated with them, because their use as a locator is limited to
>     contexts where the access method is understood.  This is the case,
>     for example, with the "cid" and "mid" URI schemes.  For these URI
>     schemes, the specification should describe the notation of the
>     scheme, the contexts of use, and a complete mapping of the locator
>     from its source.
> 
> In other words, the mapping is always from locator to source.

I think the point of the original paragraph was really about translation,
and that it is more common than not for URI schemes to be invented by
a desire to 'translate' some identification or location method into
the URI space, even though the result isn't a complete mapping.
The result, of course, is an inverse mapping. But people want to
go both ways ("How do I say this sequence of FTP commands as a URI?" =>
"Well, you can't" or "Here you go... ftp://blah" ) as well as
("What set of FTP commands might arise from using ftp://blah...").

> ====
> 
> 2.4  Definition of operations
> 
>     In addition to the definition of how a URI identifies a resource, a
>     URI scheme definition should also define, if applicable, the set of
>     operations that may be performed on a resource using the URI as its
>     identifier.  The basis for this model is HTTP; a HTTP resource can be
>     operated on by GET, POST, PUT and a number of other operations
>     available through the HTTP protocol.  The URI scheme definition
>     should describe all well-defined operations on the URI identifier,
>     and what they are supposed to do.
> 
> I think the middle sentence should be a "For example, ..." type -- there
> is no need to frame this as the HTTP model.  It is true of all IR 
> protocols.

I think there are many IR protocols that don't have an explicit model
of "resource" and "operations" and that HTTP was, in fact, the basis
for this model. (Try to find a 'resource' in Z39.50.) However, I'm
happy to change "The basis for this model is HTTP" to "A basis for this
model was HTTP".


>     Some URI schemes (for example, "telnet") provide location information
>     for hooking onto bi-directional data streams, and don't fit the
>     "infoaccess" paradigm of most URIs very well; this should be
>     documented.
> 
> There is way too much context hidden here.  It should just provide an
> alternative example, specifically that of telnet, in which the only
> operation defined is to initiate the connection and login.  Likewise,
> I suggest providing an example of a scheme that has no defined 
> operations.

How about rewriting what's there as

  Some URI schemes don't fit into the "information access" paradigm of URIs.
  For example, "telnet" provides location information for initiating
  a bi-directional data stream to a remote host; the only operation defined
  is to initiate the connection.  In any case, the operations appropriate
  for a URI scheme should be documented.

I don't think 'telnet:' separates 'initiate' from 'login' as separate
operations. If you'd like to suggest an example of a scheme that has
no defined operations, please do so.


> =====
> 
> 2.5  Character encoding
> 
>     When describing URI schemes in which (some of) the elements of the
>     URI are actually representations of sequences of characters, care ...
> 
> should be "actually representations of human-readable text, care ..."
> 
>     should be taken not to introduce unnecessary variety in the ways in
>     which characters are encoded into octets and then into URI
>     characters.  Unless there is some compelling reason for a  particular
>     scheme to do otherwise, translating character sequences into UTF-8
>     (RFC 2279 [4]) and then subsequently using the %HH encoding for
>     unsafe octets is recommended.
> 
> unsafe octets is a leftover -- I suggest referring to section 2.5 of
> RFC XXXX instead.

OK.

> ====
> 
> 2.6  Clear security considerations
> 
> Add
> 
>     o  Carefully read and understand the security considerations
described
>        in Section 7 of RFC XXXX and note any that apply to the new scheme.

The list says that "Definitions of URI schemes should be accompanied by ..."

so a requirement that someone have "read and understood" section 7 doesn't
really fit into the list. I think it's fine to ask

        o Section 7 of RFC 2396bis describes general security considerations
           for URI schemes. Note which of those apply to the new scheme.

        
> ====
> 
> 2.7  Scheme Name considerations
> 
> Shouldn't this quote the ABNF definition in RFC XXXX and specifically
> note that schemes must be registered as lowercase?


Sure, good idea.
 
> ====
> 
> 3.  URI Scheme Registration Procedure
> 
> 3.1  General
> 
> ...
> 
>     Provisional status is useful for registering legacy URI schemes that
>     have already been widely deployed without registration, and for which
>     review at this time would be inappropriate.  Provisional status may
>     also be useful for private or experimental use.
> 
>     Permanent status is intended for use by IETF standards-track
>     protocols.  The status requires a substantive review and approval
>     process.
> 
> I would reverse the order of these two paragraphs -- standards-track 
> should always go first.  I would think that permanent status would 
> apply to any specification approved by the IESG, not just standards-track,
and 
> indeed ...


Yes; although this part of the specification is under discussion; I'm fine
with describing the "Permanent" registrations first, if it can be done so
without confusion.

> ...
> 
>     Permanent registration of a URI scheme requires IETF review and IESG
>     approval.  In many cases, permanent registration involves the
>     promotion of an existing provisional registration.  In general, the
>     creation of a new permanent URI scheme requires a Standards Track
>     RFC.  In some cases, a URI scheme registration in an Informational
>     RFC may be approved by the IESG for 'permanent' URI registration.
> 
> This is way too vague!  IANA provides in RFC 2434 the list of all
> levels of review that might be applied.  All we need to do is list the
> ones we need, using the same terminology as provided by IANA, along
> with the policies for allocation.  The entire registration policy
> (aside from the templates) can be defined in one paragraph.

I think it's a good idea for us to couch IANA registration rules in
terms of RFC 2434, and it's something we haven't done. Once we decide
for certain how many levels we're going to have.


> In fact, skipping through the rest of the spec indicates that RFC 2434
> is completely absent (it should be a normative reference) and the
remaining
> document needs to incorporate its terminology for policies -- that  should
> cut the length and make it much easier for IANA to review and apply.

Yes, I think this makes sense.

We did 2717bis-2718bis mainly by cut-and-paste from 2717 and 2718, alas,
and a careful reevaluation using RFC 2434 as a normative reference makes
a lot of sense to me.

Larry
-- 
http://larry.masinter.net
Received on Monday, 24 January 2005 02:36:59 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:08 UTC