Re: Posted draft of URI comparison finding from Martin Duerst on 2002-12-05 (www-tag@w3.org from December 2002)

From: Martin Duerst <duerst@w3.org>
Date: Thu, 05 Dec 2002 23:04:17 +0900
To: "Roy T. Fielding" <fielding@apache.org>
Cc: Tim Bray <tbray@textuality.com>, WWW-Tag <www-tag@w3.org>
Message-Id: <4.2.0.58.J.20021205215616.05140778@localhost>

At 16:05 02/12/04 -0800, Roy T. Fielding wrote:
>>- "It would seem almost wilfully perverse to consider the characters
>>    represented
>>    respectively by %7A and %7a in the example above as different."
>>
>>   But that's not my point. The sentence assumes that %7A and %7a
>>   represent a character, where in actual fact in an URI (see again 
>> section 2.1 of
>>   http://www.ietf.org/rfc/rfc2396.txt) 'z', '%7A', and '%7a' are three 
>> different
>>   ways to represent the byte <7a>, which in turn in most cases (but not 
>> necessarily
>>   guaranteed) represents the character 'z'.
>
>That hardly matters.  Section 2.1 says that %7A and %7a both
>represent the same octet, and therefore are guaranteed to be the same
>character regardless of character encoding.  Whether or not that
>character is 'z' is a different issue.

Section 2.1 indeed says that %7a and %7A both represent the
same octet. However, they are not therefore guaranteed to be
the same character regardless of encoding, because there are
encodings where a single byte is only representing part of
a character, and/or can be used in representing different
characters.

Regards,    Martin.

Received on Thursday, 5 December 2002 10:59:38 UTC