W3C home > Mailing lists > Public > www-tag@w3.org > December 2002

Re: Posted draft of URI comparison finding

From: Martin Duerst <duerst@w3.org>
Date: Thu, 05 Dec 2002 08:08:54 +0900
Message-Id: <4.2.0.58.J.20021205080150.048bdf00@localhost>
To: Tim Bray <tbray@textuality.com>, Dan Connolly <connolly@w3.org>
Cc: Chris Lilley <chris@w3.org>, www-tag@w3.org

At 09:51 02/12/02 -0800, Tim Bray wrote:

>Dan Connolly wrote:
>
>>If you have two character sequences, I still think it
>>it's proper to speak of comparing them
>>character-for-character. It's reasonably clear
>>that this gives the same result as mapping
>>the character sequence to a codepoint sequence
>>and then comparing codepoint-for-codepoint,
>>but to speak of comparing character sequences
>>codepoint-for-codepoint is a little sloppy, no?
>
>I believe the opposite.  A "character" is a complex bundle of visual and 
>linguistic semantics.  A codepoint is a number.  I know how to compare 
>numbers. -Tim

As the character model explains 
(http://www.w3.org/TR/charmod/#sec-Perceptions),
the term character is used in many, many different ways.
To take an extreme (for most present-day computer-literate people)
example, in many ways, what we write 'f' and 'F' are just one and
the same character, the character called 'eff' in English.
Using 'codepoint' makes clear that we use characters as encoded,
and because 'f' and 'F' are encoded differently, we know that
they must compare not equal.

Regards,    Martin.
Received on Wednesday, 4 December 2002 18:50:34 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:47:14 GMT