W3C home > Mailing lists > Public > uri@w3.org > April 2003

Re: Secion 6 Normalization and Comparison

From: Martin Duerst <duerst@w3.org>
Date: Wed, 16 Apr 2003 13:19:31 -0400
Message-Id: <>
To: "Williams, Stuart" <skw@hplb.hpl.hp.com>, "Tim Bray (E-mail)" <tbray@textuality.com>
Cc: "'uri@w3.org'" <uri@w3.org>

Just some very small comments:

At 16:14 03/04/14 +0100, Williams, Stuart wrote:

>I said earlier [1] that I had a couple more comments to make on the URI
>comparison section of RFC2396bis [2]. It turns out that I really only have
>two substantive comments, both on section below.
>Best regards
>6.2.2 Syntax-based normalisation and
> Path Segment Normalisation
>[already noted in [1]]
>These sections and section 4 "URI References" differ with respect to the
>interpretation of "." and ".." in absolute forms of URI.
> Escape Normalisation
>States: "One cause is the choice of upper-case or lower-case letters for the
>hexadecimal digits within the escape sequence (e.g., "%3a" versus "%3A").
>Such sequences are always equivalent; for the sake of uniformity, URI
>generators and normalizers are strongly encouraged to use upper-case letters
>for the hex digits A-F."
>"... Such sequences are always equivalent;..." this seems to ignore the
>aspect of the purpose of the comparison - eg. are such sequences equivalent
>for the purpose of naming a namespace?

Good point.

>Also states: "Only characters that are excluded from or reserved within the
>URI syntax must be escaped when used as data. However, some URI generators
>go beyond that and escape characters that do not require escaping, resulting
>in URIs that are equivalent to their unescaped counterparts. Such URIs can
>be normalized by unescaping sequences that represent the unreserved
>characters, as described in Section 2.3."
>I think that the reserved use of some characters is scoped by scheme and URI
>syntax component (scheme, authority, path, query, fragment)ie. their
>reserved purpose is only applicable in certain fields and so escaping should
>only be applied to a reserved character when it's reserved purpose is in

Yes, indeed. This is relatively easy to do when an URI is generated,
but difficult to do in general, because it would require scheme,...-
specific knowledge.

>Also, in general it is not clear to me that it is legitimate to unescape the
>escape sequence, because in general one doesn't know the character set of
>the escaped character - only authority that minted the URI knows that -
>looking at a URI you only get to know what octet was escaped. [I think].

It is true that you don't know the original character.
But it's also true that 'A' and '%41' represent the same octet, and
so changing from one to the other, in contexts where these are
equivalent, can be done without knowledge of the original character.

Regards,    Martin.

>[1] http://lists.w3.org/Archives/Public/www-tag/2003Mar/0070.html
>[2] http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html
Received on Wednesday, 16 April 2003 13:23:40 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:05 UTC