Re: Posted draft of URI comparison finding from Martin Duerst on 2002-12-04 (www-tag@w3.org from December 2002)

From: Martin Duerst <duerst@w3.org>
Date: Thu, 05 Dec 2002 08:08:54 +0900
To: Tim Bray <tbray@textuality.com>, Dan Connolly <connolly@w3.org>
Cc: Chris Lilley <chris@w3.org>, www-tag@w3.org
Message-Id: <4.2.0.58.J.20021205080150.048bdf00@localhost>

At 09:51 02/12/02 -0800, Tim Bray wrote:

>Dan Connolly wrote:
>
>>If you have two character sequences, I still think it
>>it's proper to speak of comparing them
>>character-for-character. It's reasonably clear
>>that this gives the same result as mapping
>>the character sequence to a codepoint sequence
>>and then comparing codepoint-for-codepoint,
>>but to speak of comparing character sequences
>>codepoint-for-codepoint is a little sloppy, no?
>
>I believe the opposite.  A "character" is a complex bundle of visual and 
>linguistic semantics.  A codepoint is a number.  I know how to compare 
>numbers. -Tim

As the character model explains 
(http://www.w3.org/TR/charmod/#sec-Perceptions),
the term character is used in many, many different ways.
To take an extreme (for most present-day computer-literate people)
example, in many ways, what we write 'f' and 'F' are just one and
the same character, the character called 'eff' in English.
Using 'codepoint' makes clear that we use characters as encoded,
and because 'f' and 'F' are encoded differently, we know that
they must compare not equal.

Regards,    Martin.

Received on Wednesday, 4 December 2002 18:50:34 UTC