- From: Williams, Stuart <skw@hplb.hpl.hp.com>
- Date: Mon, 28 Apr 2003 10:35:21 +0100
- To: "'Roy T. Fielding'" <fielding@apache.org>
- Cc: uri@w3.org
> > 6.2.2.2 Escape Normalisation > > ---------------------------- > > > > States: "One cause is the choice of upper-case or lower-case letters for the > > hexadecimal digits within the escape sequence (e.g., "%3a" versus "%3A"). > > Such sequences are always equivalent; for the sake of uniformity, URI > > generators and normalizers are strongly encouraged to use upper-case > > letters for the hex digits A-F." > > > > "... Such sequences are always equivalent;..." this seems to ignore > > the aspect of the purpose of the comparison - eg. are such sequences > > equivalent for the purpose of naming a namespace? > > Yes, they are always equivalent. They won't necessarily be > the same for comparison, but they are equivalent (which means > applications can replace one with the other if they so desire). Oh...! The Namespaces 1.1 CR [1] gives the following example (well yes, expressed in IRI rather than URI terms): "The IRI references below are also all different for the purposes of identifying namespaces: ... http://www.example.org/~wilbur http://www.example.org/%7ewilbur http://www.example.org/%7Ewilbur " Which I read as making these three identifiers *not* equivalent for the purpose of naming a namespace. [1] http://www.w3.org/TR/xml-names11/#IRIComparison <snip/> > > Unescaping those characters isn't worth the risk, and > considerably complicates the normalizer. Ok... I can see that tracking the scope of the reserved/restricted purpose of a character would complicate both escaping and unescaping, and that it give more scope for mistakes. > > Also, in general it is not clear to me that it is legitimate to > > unescape the escape sequence, because in general one doesn't know the character set > > of the escaped character - only authority that minted the URI knows that - > > looking at a URI you only get to know what octet was escaped. [I think]. > > That doesn't matter because the octet remains the same > whether it is escaped or not. The escaping merely prevents > characters from being misinterpreted as delimiters of > components or of the URI itself. I agree, it's of no consequence for octet based comparison (as in [2] URI Characters seq->octet seq->Original Character seq). *If* the document were to say very clearly that URI comparisons should be based on comparing octet sequences, at least for me, that would explain your response above - ~, %7e, %7E all contribute the same to an octet sequence. If it already does say so, then I apologise, because so far I have missed it. [2] http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html#rfc.section.2.1 > > ....Roy > Regards Stuart
Received on Monday, 28 April 2003 05:35:40 UTC