Re: draft-fielding-uri-rfc2396bis-03, section 1 from Martin Duerst on 2003-06-18 (uri@w3.org from June 2003)

From: Martin Duerst <duerst@w3.org>
Date: Wed, 18 Jun 2003 14:53:51 -0400
To: "Roy T. Fielding" <fielding@apache.org>, uri@w3.org
Message-Id: <4.2.0.58.J.20030618140512.0710aae8@localhost>
These are my comments on section 1:

1.1.1 Generic Syntax: I think this section comes very close to
     being explicit on the following point:
       Any scheme that wants to use any of the reserved characters
       has to use them in the sense defined in this specification.
     Nailing this down once and for all, and as clearly as possible
     would help avoid questions.

1.1.2: I agree that gopher: can go.

1.2.2: Please add examples for "resolution", "dereference", and
        "retrieval".

1.2.3 Hierarchical Identifiers: This title appears in a strange
       location. I thought the xmlrfc script was better. But I
       guess this will be taken care by the RFC editor.

1.2.3: The URI syntax is organized ... decreasing order from left to right.
     Please mention the exception of the components of 'hostname'.

1.3 Syntax Notation:

    "Although the ABNF
    defines syntax in terms of the US-ASCII character encoding [ASCII],
    the URI syntax should be interpreted in terms of the character that
    the ASCII-encoded octet represents, rather than the octet encoding
    itself."

    This is confusing, and maybe not exactly correct. RFC 2234 says:

    >>>>
    2.3  Terminal Values

    Rules resolve into a string of terminal values, sometimes called
    characters.  In ABNF a character is merely a non-negative integer.
    In certain contexts a specific mapping (encoding) of values into a
    character set (such as ASCII) will be specified.
    >>>>

    and

    >>>>
    Literal text strings are interpreted as a concatenated set of
    printable characters.

         NOTE:     ABNF strings are case-insensitive and
                   the character set for these strings is us-ascii.
    >>>>

    So when we write "%" in
       escaped = "%" HEXDIG HEXDIG
    what happens is:
    1) the "%" per definition of RFC 2234 get interpreted as character
       number 37, or %d37 or %x25 in the ABNF's own notation.
       (this is currently wrong)
    2) we have to define that we interpret these numbers according to ASCII
       (this is currently missing; we do that for %-escaping, but not for ABNF)
    3) we can repeat that we are only concerned in the resulting characters,
       not with their encoding.

    So what I would write is:

    "Note that ABNF defines characters to be just non-negative integers.
    It also uses literal text strings to denote characters, based on the
    US-ASCII encoding. We in turn use ABNF with the US-ASCII encoding to
    map from numbers back to actual characters, because URIs are defined
    as strings of characters independent of any particular encoding."


1.3 Syntax Notation

    "How a URI is represented in terms of bits and bytes on the
    wire is dependent upon the character encoding of the protocol used to
    transport it, or the charset of the document that contains it."

    This mistakenly gives the impression that protocols use "character
    encodings", and documents use "charsets", and that these two are
    different. Please streamline to "character encodings".


1.3 Syntax Notation

    "ALPHA, CR, CTL, DIGIT, DQUOTE, HEXDIG, LF, OCTET, and SP"

    To help the reader, I suggest the following:

    "ALPHA (letters), CR (carriage return), CTL (control characters),
     DIGIT (digits), DQUOTE (double quote), HEXDIG (hexadecimal digits),
     LF (line feed), OCTET (octets), and SP (space)"

     Maybe the most obvious such as digits and octets can be left out.
     I'm also not sure whether we need OCTET, because it is not
     part of the final syntax, it does not get interpreted according
     to ASCII, it occurs only once ("strings of data (1*OCTET)") and
     can easily be replaced there.


1.3 Syntax Notation

     Why are we not using MUST/SHOULD, and don't reference their
     official definition? Are we using different language?
     Do we think we can avoid lots of useless discussion?


Regards,    Martin.



At 19:07 03/06/06 -0700, Roy T. Fielding wrote:

>I have just submitted draft 03, which can also be obtained via
>the issues list at
>
>    http://www.apache.org/~fielding/uri/rev-2002/issues.html
>
>Please note that all issues have been fixed or closed.  If you'd
>like to raise a new issue or reopen an old one, please do so
>within the next two weeks.
>
>
>Cheers,
>
>Roy T. Fielding, Chief Scientist, Day Software
>                  2 Corporate Plaza, Suite 150
>                  Newport Beach, CA 92660-7929   fax:+1.949.644.5064
>                  (roy.fielding@day.com) <http://www.day.com/>
>
>                  Co-founder, The Apache Software Foundation
>                  (fielding@apache.org)  <http://www.apache.org/>
Received on Wednesday, 18 June 2003 14:54:11 UTC