- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 16 Apr 2003 13:19:31 -0400
- To: "Williams, Stuart" <skw@hplb.hpl.hp.com>, "Tim Bray (E-mail)" <tbray@textuality.com>
- Cc: "'uri@w3.org'" <uri@w3.org>
Just some very small comments: At 16:14 03/04/14 +0100, Williams, Stuart wrote: >Tim, > >I said earlier [1] that I had a couple more comments to make on the URI >comparison section of RFC2396bis [2]. It turns out that I really only have >two substantive comments, both on section 6.2.2.2 below. > >Best regards > >Stuart >-- > >6.2.2 Syntax-based normalisation and >6.2.2.3 Path Segment Normalisation >------------------------------------- >[already noted in [1]] > >These sections and section 4 "URI References" differ with respect to the >interpretation of "." and ".." in absolute forms of URI. > >6.2.2.2 Escape Normalisation >---------------------------- > >States: "One cause is the choice of upper-case or lower-case letters for the >hexadecimal digits within the escape sequence (e.g., "%3a" versus "%3A"). >Such sequences are always equivalent; for the sake of uniformity, URI >generators and normalizers are strongly encouraged to use upper-case letters >for the hex digits A-F." > >"... Such sequences are always equivalent;..." this seems to ignore the >aspect of the purpose of the comparison - eg. are such sequences equivalent >for the purpose of naming a namespace? Good point. >Also states: "Only characters that are excluded from or reserved within the >URI syntax must be escaped when used as data. However, some URI generators >go beyond that and escape characters that do not require escaping, resulting >in URIs that are equivalent to their unescaped counterparts. Such URIs can >be normalized by unescaping sequences that represent the unreserved >characters, as described in Section 2.3." > >I think that the reserved use of some characters is scoped by scheme and URI >syntax component (scheme, authority, path, query, fragment)ie. their >reserved purpose is only applicable in certain fields and so escaping should >only be applied to a reserved character when it's reserved purpose is in >scope. Yes, indeed. This is relatively easy to do when an URI is generated, but difficult to do in general, because it would require scheme,...- specific knowledge. >Also, in general it is not clear to me that it is legitimate to unescape the >escape sequence, because in general one doesn't know the character set of >the escaped character - only authority that minted the URI knows that - >looking at a URI you only get to know what octet was escaped. [I think]. It is true that you don't know the original character. But it's also true that 'A' and '%41' represent the same octet, and so changing from one to the other, in contexts where these are equivalent, can be done without knowledge of the original character. Regards, Martin. >[1] http://lists.w3.org/Archives/Public/www-tag/2003Mar/0070.html >[2] http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html
Received on Wednesday, 16 April 2003 13:23:40 UTC