Re: Posted draft of URI comparison finding from Dan Connolly on 2002-12-02 (www-tag@w3.org from December 2002)

From: Dan Connolly <connolly@w3.org>
Date: 02 Dec 2002 11:44:03 -0600
To: Chris Lilley <chris@w3.org>
Cc: www-tag@w3.org, Tim Bray <tbray@textuality.com>
Message-Id: <1038851045.5180.11031.camel@dirk>

On Mon, 2002-12-02 at 09:14, Chris Lilley wrote:
> On Monday, December 2, 2002, 3:42:56 PM, Dan wrote:
> 
> 
> DC> |In Unicode terminology, this would be properly referred
> DC> | to as codepoint-for-codepoint comparison.
> 
> DC> Well, it's only codepoint-for-codepoint after you map
> DC> the charcters to codepoints; character-for-character
> DC> is just as proper, no?
> 
> No.
> 
> Characters are defined as unicode codepoints.

Really? where? I understood codepoints to *correspond*
to characters, but not to *be* characters.

"Each character in the repertoire is then associated with a
(mathematical, abstract) non-negative integer, the code point (also
known as a character number or code position). The result, a mapping
from the repertoire to the set of non-negative integers, is called a
coded character set (CCS)."
 -- http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-Digital

> What byte sequences
> these codepoints become in various encodings is orthogonal, but a
> given character has a unique unicode codepoint.

I don't understand your point; I don't see how byte sequences
are relevant.

If you have two character sequences, I still think it
it's proper to speak of comparing them
character-for-character. It's reasonably clear
that this gives the same result as mapping
the character sequence to a codepoint sequence
and then comparing codepoint-for-codepoint,
but to speak of comparing character sequences
codepoint-for-codepoint is a little sloppy, no?

-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/

Received on Monday, 2 December 2002 12:43:53 UTC