W3C home > Mailing lists > Public > www-tag@w3.org > December 2002

Re: Posted draft of URI comparison finding

From: Chris Lilley <chris@w3.org>
Date: Mon, 2 Dec 2002 19:01:58 +0100
Message-ID: <16156219906.20021202190158@w3.org>
To: www-tag@w3.org, Dan Connolly <connolly@w3.org>
CC: Tim Bray <tbray@textuality.com>

On Monday, December 2, 2002, 6:44:03 PM, Dan wrote:


DC> On Mon, 2002-12-02 at 09:14, Chris Lilley wrote:
>> On Monday, December 2, 2002, 3:42:56 PM, Dan wrote:
>> 
>> 
>> DC> |In Unicode terminology, this would be properly referred
>> DC> | to as codepoint-for-codepoint comparison.
>> 
>> DC> Well, it's only codepoint-for-codepoint after you map
>> DC> the charcters to codepoints; character-for-character
>> DC> is just as proper, no?
>> 
>> No.
>> 
>> Characters are defined as unicode codepoints.

DC> Really? where? I understood codepoints to *correspond*
DC> to characters, but not to *be* characters.

DC> "Each character in the repertoire is then associated with a
DC> (mathematical, abstract) non-negative integer, the code point (also
DC> known as a character number or code position). The result, a mapping
DC> from the repertoire to the set of non-negative integers, is called a
DC> coded character set (CCS)."
DC>  -- http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-Digital

And XML uses exactly one such CSS as the document character set. If
you have an example of another W3C spec that uses a different CCS I
would be happy to see such an example.

>> What byte sequences
>> these codepoints become in various encodings is orthogonal, but a
>> given character has a unique unicode codepoint.

DC> I don't understand your point; I don't see how byte sequences
DC> are relevant.

I was just adding that for clarification and for the record; I know
that you already understand it.

DC> If you have two character sequences, I still think it
DC> it's proper to speak of comparing them
DC> character-for-character.


But its rather sloppy.

DC> It's reasonably clear
DC> that this gives the same result as mapping
DC> the character sequence to a codepoint sequence
DC> and then comparing codepoint-for-codepoint,
DC> but to speak of comparing character sequences
DC> codepoint-for-codepoint is a little sloppy, no?

No, its rather more precise in fact. Or would you imagine that
software would actually use the name of the character rather than its
codepoint, to do the comparison?

It seems clearer and more precise to speak of Unicode codepoint
comparisons.




-- 
 Chris                            mailto:chris@w3.org
Received on Monday, 2 December 2002 13:02:09 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:47:14 GMT