Re: Secion 6 Normalization and Comparison from Martin Duerst on 2003-04-16 (uri@w3.org from April 2003)

From: Martin Duerst <duerst@w3.org>
Date: Wed, 16 Apr 2003 13:19:31 -0400
To: "Williams, Stuart" <skw@hplb.hpl.hp.com>, "Tim Bray (E-mail)" <tbray@textuality.com>
Cc: "'uri@w3.org'" <uri@w3.org>
Message-Id: <4.2.0.58.J.20030416131401.03102000@localhost>

Just some very small comments:

At 16:14 03/04/14 +0100, Williams, Stuart wrote:

>Tim,
>
>I said earlier [1] that I had a couple more comments to make on the URI
>comparison section of RFC2396bis [2]. It turns out that I really only have
>two substantive comments, both on section 6.2.2.2 below.
>
>Best regards
>
>Stuart
>--
>
>6.2.2 Syntax-based normalisation and
>6.2.2.3 Path Segment Normalisation
>-------------------------------------
>[already noted in [1]]
>
>These sections and section 4 "URI References" differ with respect to the
>interpretation of "." and ".." in absolute forms of URI.
>
>6.2.2.2 Escape Normalisation
>----------------------------
>
>States: "One cause is the choice of upper-case or lower-case letters for the
>hexadecimal digits within the escape sequence (e.g., "%3a" versus "%3A").
>Such sequences are always equivalent; for the sake of uniformity, URI
>generators and normalizers are strongly encouraged to use upper-case letters
>for the hex digits A-F."
>
>"... Such sequences are always equivalent;..." this seems to ignore the
>aspect of the purpose of the comparison - eg. are such sequences equivalent
>for the purpose of naming a namespace?

Good point.


>Also states: "Only characters that are excluded from or reserved within the
>URI syntax must be escaped when used as data. However, some URI generators
>go beyond that and escape characters that do not require escaping, resulting
>in URIs that are equivalent to their unescaped counterparts. Such URIs can
>be normalized by unescaping sequences that represent the unreserved
>characters, as described in Section 2.3."
>
>I think that the reserved use of some characters is scoped by scheme and URI
>syntax component (scheme, authority, path, query, fragment)ie. their
>reserved purpose is only applicable in certain fields and so escaping should
>only be applied to a reserved character when it's reserved purpose is in
>scope.

Yes, indeed. This is relatively easy to do when an URI is generated,
but difficult to do in general, because it would require scheme,...-
specific knowledge.


>Also, in general it is not clear to me that it is legitimate to unescape the
>escape sequence, because in general one doesn't know the character set of
>the escaped character - only authority that minted the URI knows that -
>looking at a URI you only get to know what octet was escaped. [I think].

It is true that you don't know the original character.
But it's also true that 'A' and '%41' represent the same octet, and
so changing from one to the other, in contexts where these are
equivalent, can be done without knowledge of the original character.


Regards,    Martin.



>[1] http://lists.w3.org/Archives/Public/www-tag/2003Mar/0070.html
>[2] http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html

Received on Wednesday, 16 April 2003 13:23:40 UTC