comments on draft-hansen-2717bis-2718bis-uri-guidelines-02 from Roy T. Fielding on 2005-01-05 (uri@w3.org from January 2005)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Tue, 4 Jan 2005 17:10:22 -0800
To: uri <uri@w3.org>
Message-Id: <92FE574B-5EB6-11D9-88BD-000D93324AD6@gbiv.com>
Here are my comments on draft-hansen-2717bis-2718bis-uri-guidelines-02:

The abstract should simply state what the document is and should spell
out the acronym, as in

    This document provides guidelines, recommendations, and a mechanism
    for the definition and registration of Uniform Resource Identifier 
(URI)
    schemes.

and the second sentence ["The registration requirements have been 
simplified
by providing for provisional registrations that need no technical review
and may share names with existing scheme names.] should be deleted 
because
it makes no sense outside current discussion.

1.  Introduction

    A Uniform Resource Identifier (URI) is a compact string
    representation for identifying resources.  RFC XXXX [6] defines the
    general syntax of URIs.

Whoa, we don't want to spend another three years arguing over the 
precise
wording of the definition of URIs, do we?  So, don't try to rephrase the
definition (incorrectly) here.  Start with

    The Uniform Resource Identifier (URI) protocol element and generic 
syntax
    is defined by RFC XXXX [6].  Each URI begins with a scheme name, as 
defined
    in Section 3.1 of RFC XXXX, that refers to a specification for 
assigning
    identifiers within that scheme.  As such, the URI syntax is a 
federated and
    extensible naming system wherein each scheme's specification may 
further
    restrict the syntax and semantics of identifiers using that scheme.

    This document provides guidelines for the definition of new URI 
schemes,
    for consideration by those who are defining, registering, or 
evaluating
    those definitions, as well as a process and mechanism for 
registering URI
    schemes within the IANA URI scheme registry [ref].  This document 
obsoletes
    both RFCs 2717 [2] and 2718 [3].

======

    The original terminology for the URI protocol element attempted to 
...

Er, that terminology was added two years after the original URI.  Try

    RFCs 2717 and 2718 draw a distinction between 'locators' -- 
identifiers
    used for accessing resources available on the Internet, and 'names' 
--
    identifiers used for naming possibly abstract resources, independent
    of any mechanism for accessing them.  The intent was to use the
    designation "URL" (Uniform Resource Locator) for those identifiers
    that were locators, and "URN" (Uniform Resource Name) for those
    identifiers that were names.  In practice, the line between 'locator'
    and 'name' has been difficult to draw: locators can be used as names,
    and names can be used as locators.

====

    As a result, recent documents have used the term "URI" for all
    resource identifiers, avoiding the term "URL", and reserving the term
    "URN" explicitly for those URIs using the "urn" scheme name (RFC 2141
    [1]).  URNs remain a distinct class of URIs because of the
    requirements set out in RFC  3406 [8]; this document's procedures do
    not update or supersede the procedures set out in RFC 3406.

Stick to the facts, please -- the only thing distinct about URNs is
the scheme name and that scheme will be in this registry.  How about

    As a result, recent documents have used the term "URI" for all
    resource identifiers, avoiding the term "URL", and reserving the term
    "URN" explicitly for those URIs using the "urn" scheme name (RFC 2141
    [1]).  URN "namespaces" (RFC 3406 [8]) are specific to the "urn" 
scheme
    and outside the scope of this document.

====

    RFC 2717 defined a set of registration trees in which URI schemes
    could be registered, one of which was called the IETF Tree, to be
    managed by IANA.  RFC 2717 proposed that additional registration
    trees might be approved by the IESG, however, no such registration
    trees have been approved.

should be "... the IESG.  However, ...".

    This document eliminates RFC 2717's distinction between different
    'trees' for URI schemes; instead there is a single namespace for
    registered values.  Within that namespace, there are values that are
    approved as meeting a set of criteria for URI schemes.  Other scheme
    names may also be registered provisionally or without necessarily
    passing any review process or criteria.

should be "... provisionally, without necessarily ..."

====

2.2  Syntactic compatibility

    RFC XXXX [6] defines a generic syntax for URI schemes with
    hierarchical components and a naming authority.  New URI schemes
    should follow this syntax.

Not strong enough.  It should say

    RFC XXXX [6] defines the generic syntax for all URI schemes,
    along with the syntax of common URI components that are used by many
    URI schemes to define hierarchical identifiers.  All URI scheme
    specifications must define their own syntax such that all strings
    matching their scheme-specific syntax will also match the 
<absolute-URI>
    grammar described in Section 4.3 of RFC XXXX.

    New URI schemes should reuse the common URI components of RFC XXXX
    for the definition of hierarchical naming schemes.  However, if there
    is a strong reason for a URI scheme to not use the hierarchical 
syntax,
    then the new scheme definition should at least follow the syntax of
    previously registered schemes, if possible.

    URI schemes that are not intended for use with relative URIs should
    avoid use of the forward slash "/" character, which is used for
    hierarchical delimiters, and the complete path segments "." and ".."
    (dot-segments).

    Avoid improper use of "//".  The use of double slashes in the first
    part of a URI is not an artistic indicator that what follows
    is a URI: Double slashes are used ONLY when the syntax of the URI's
    <scheme-specific-part> contains a hierarchical structure as described
    in RFC XXXX.  In URIs from such schemes, the use of double slashes
    indicates that what follows is the top hierarchical element for a
    naming authority.  (See section ???? of RFC XXXX for more details.)
    URI schemes that do not contain a conformant hierarchical structure
    in their <scheme-specific-part> should not use double slashes
    following the "<scheme>:" string.

====

2.3  Well-Defined

...

    In many cases, new URI schemes are defined as ways to translate other
    protocols and name spaces into the general framework of URIs.  For
    example, the "ftp" URI scheme translates from the FTP protocol, while
    the "mid" URI scheme translates from the Message-ID field of
    messages.  For such schemes, the description of the mapping must be
    complete, must describe how characters get encoded or not in URIs,
    must describe exactly how all legal values of the base standard can
    be represented using the URI scheme, and exactly which modifiers,
    alternate forms and other artifacts from the base standards are
    included or not included.  These requirements are elaborated below.

While that description is appealing, it is also wrong.  In fact, the 
"ftp"
URI scheme does not "translate" from the FTP protocol -- what it does
is map identifiers to the specific interface of an FTP/TCP/IP server.
FTP (the base standard) is far more capable than the limited set of
resources that can be identified via the "ftp" URI.  In fact, the
paragraph following it is more accurate for all locator schemes:

    In some cases, URI schemes do not have particular network protocols
    associated with them, because their use as a locator is limited to
    contexts where the access method is understood.  This is the case,
    for example, with the "cid" and "mid" URI schemes.  For these URI
    schemes, the specification should describe the notation of the
    scheme, the contexts of use, and a complete mapping of the locator
    from its source.

In other words, the mapping is always from locator to source.

====

2.4  Definition of operations

    In addition to the definition of how a URI identifies a resource, a
    URI scheme definition should also define, if applicable, the set of
    operations that may be performed on a resource using the URI as its
    identifier.  The basis for this model is HTTP; a HTTP resource can be
    operated on by GET, POST, PUT and a number of other operations
    available through the HTTP protocol.  The URI scheme definition
    should describe all well-defined operations on the URI identifier,
    and what they are supposed to do.

I think the middle sentence should be a "For example, ..." type -- there
is no need to frame this as the HTTP model.  It is true of all IR 
protocols.

    Some URI schemes (for example, "telnet") provide location information
    for hooking onto bi-directional data streams, and don't fit the
    "infoaccess" paradigm of most URIs very well; this should be
    documented.

There is way too much context hidden here.  It should just provide an
alternative example, specifically that of telnet, in which the only
operation defined is to initiate the connection and login.  Likewise,
I suggest providing an example of a scheme that has no defined 
operations.

=====

2.5  Character encoding

    When describing URI schemes in which (some of) the elements of the
    URI are actually representations of sequences of characters, care ...

should be "actually representations of human-readable text, care ..."

    should be taken not to introduce unnecessary variety in the ways in
    which characters are encoded into octets and then into URI
    characters.  Unless there is some compelling reason for a particular
    scheme to do otherwise, translating character sequences into UTF-8
    (RFC 2279 [4]) and then subsequently using the %HH encoding for
    unsafe octets is recommended.

unsafe octets is a leftover -- I suggest referring to section 2.5 of
RFC XXXX instead.

====

2.6  Clear security considerations

Add

    o  Carefully read and understand the security considerations 
described
       in Section 7 of RFC XXXX and note any that apply to the new 
scheme.

====

2.7  Scheme Name considerations

Shouldn't this quote the ABNF definition in RFC XXXX and specifically
note that schemes must be registered as lowercase?

====

3.  URI Scheme Registration Procedure

3.1  General

...

    Provisional status is useful for registering legacy URI schemes that
    have already been widely deployed without registration, and for which
    review at this time would be inappropriate.  Provisional status may
    also be useful for private or experimental use.

    Permanent status is intended for use by IETF standards-track
    protocols.  The status requires a substantive review and approval
    process.

I would reverse the order of these two paragraphs -- standards-track 
should
always go first.  I would think that permanent status would apply to any
specification approved by the IESG, not just standards-track, and 
indeed ...

...

    Permanent registration of a URI scheme requires IETF review and IESG
    approval.  In many cases, permanent registration involves the
    promotion of an existing provisional registration.  In general, the
    creation of a new permanent URI scheme requires a Standards Track
    RFC.  In some cases, a URI scheme registration in an Informational
    RFC may be approved by the IESG for 'permanent' URI registration.

This is way too vague!  IANA provides in RFC 2434 the list of all
levels of review that might be applied.  All we need to do is list the
ones we need, using the same terminology as provided by IANA, along
with the policies for allocation.  The entire registration policy
(aside from the templates) can be defined in one paragraph.

In fact, skipping through the rest of the spec indicates that RFC 2434
is completely absent (it should be a normative reference) and the 
remaining
document needs to incorporate its terminology for policies -- that 
should
cut the length and make it much easier for IANA to review and apply.

Cheers,

Roy T. Fielding                            <http://roy.gbiv.com/>
Chief Scientist, Day Software              <http://www.day.com/>
Received on Wednesday, 5 January 2005 01:10:29 UTC