- From: Williams, Stuart <skw@hplb.hpl.hp.com>
- Date: Mon, 14 Apr 2003 16:14:26 +0100
- To: "Tim Bray (E-mail)" <tbray@textuality.com>
- Cc: "'uri@w3.org'" <uri@w3.org>
Tim, I said earlier [1] that I had a couple more comments to make on the URI comparison section of RFC2396bis [2]. It turns out that I really only have two substantive comments, both on section 6.2.2.2 below. Best regards Stuart -- 6.2.2 Syntax-based normalisation and 6.2.2.3 Path Segment Normalisation ------------------------------------- [already noted in [1]] These sections and section 4 "URI References" differ with respect to the interpretation of "." and ".." in absolute forms of URI. 6.2.2.2 Escape Normalisation ---------------------------- States: "One cause is the choice of upper-case or lower-case letters for the hexadecimal digits within the escape sequence (e.g., "%3a" versus "%3A"). Such sequences are always equivalent; for the sake of uniformity, URI generators and normalizers are strongly encouraged to use upper-case letters for the hex digits A-F." "... Such sequences are always equivalent;..." this seems to ignore the aspect of the purpose of the comparison - eg. are such sequences equivalent for the purpose of naming a namespace? Also states: "Only characters that are excluded from or reserved within the URI syntax must be escaped when used as data. However, some URI generators go beyond that and escape characters that do not require escaping, resulting in URIs that are equivalent to their unescaped counterparts. Such URIs can be normalized by unescaping sequences that represent the unreserved characters, as described in Section 2.3." I think that the reserved use of some characters is scoped by scheme and URI syntax component (scheme, authority, path, query, fragment)ie. their reserved purpose is only applicable in certain fields and so escaping should only be applied to a reserved character when it's reserved purpose is in scope. Also, in general it is not clear to me that it is legitimate to unescape the escape sequence, because in general one doesn't know the character set of the escaped character - only authority that minted the URI knows that - looking at a URI you only get to know what octet was escaped. [I think]. [1] http://lists.w3.org/Archives/Public/www-tag/2003Mar/0070.html [2] http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html
Received on Monday, 14 April 2003 11:14:49 UTC