Re: Draft 2 of "How to Compare URIs" from Tim Bray on 2002-12-13 (www-tag@w3.org from December 2002)

From: Tim Bray <tbray@textuality.com>
Date: Fri, 13 Dec 2002 07:28:15 -0800
To: Stefan Eissing <stefan.eissing@greenbytes.de>
Cc: WWW-Tag <www-tag@w3.org>
Message-ID: <3DF9FC8F.9030608@textuality.com>

Stefan Eissing wrote:

> RFC 2396 Ch. 2.1
> 
> " In the simplest case, the original character sequence contains only 
> characters that are defined in US-ASCII, and the two levels of mapping 
> are simple and easily invertible: each 'original character' is 
> represented as the octet for the US-ASCII code for it, which is, in 
> turn, represented as either the US-ASCII character, or else the "%" 
> escape sequence for that octet."

You're saying you read this as "all characters in the ASCII range must 
use the ASCII codepoints for character->octet"?  I guess that's 
plausible, but I had read 2.1 to say "there are many character->octet 
mappings, one of the simplest being that for ASCII chracters".  And 
assuming you're right, it still seems like there's a window open here, 
if you're operating in a non-ASCII environment then the char->octet 
mapping is left 100% undefined, so you can't know whether %xx == %xx for 
all %xx > 0x7f. -Tim

Received on Friday, 13 December 2002 10:28:17 UTC