- From: Chris Lilley <chris@w3.org>
- Date: Mon, 2 Dec 2002 19:01:58 +0100
- To: www-tag@w3.org, Dan Connolly <connolly@w3.org>
- CC: Tim Bray <tbray@textuality.com>
On Monday, December 2, 2002, 6:44:03 PM, Dan wrote: DC> On Mon, 2002-12-02 at 09:14, Chris Lilley wrote: >> On Monday, December 2, 2002, 3:42:56 PM, Dan wrote: >> >> >> DC> |In Unicode terminology, this would be properly referred >> DC> | to as codepoint-for-codepoint comparison. >> >> DC> Well, it's only codepoint-for-codepoint after you map >> DC> the charcters to codepoints; character-for-character >> DC> is just as proper, no? >> >> No. >> >> Characters are defined as unicode codepoints. DC> Really? where? I understood codepoints to *correspond* DC> to characters, but not to *be* characters. DC> "Each character in the repertoire is then associated with a DC> (mathematical, abstract) non-negative integer, the code point (also DC> known as a character number or code position). The result, a mapping DC> from the repertoire to the set of non-negative integers, is called a DC> coded character set (CCS)." DC> -- http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-Digital And XML uses exactly one such CSS as the document character set. If you have an example of another W3C spec that uses a different CCS I would be happy to see such an example. >> What byte sequences >> these codepoints become in various encodings is orthogonal, but a >> given character has a unique unicode codepoint. DC> I don't understand your point; I don't see how byte sequences DC> are relevant. I was just adding that for clarification and for the record; I know that you already understand it. DC> If you have two character sequences, I still think it DC> it's proper to speak of comparing them DC> character-for-character. But its rather sloppy. DC> It's reasonably clear DC> that this gives the same result as mapping DC> the character sequence to a codepoint sequence DC> and then comparing codepoint-for-codepoint, DC> but to speak of comparing character sequences DC> codepoint-for-codepoint is a little sloppy, no? No, its rather more precise in fact. Or would you imagine that software would actually use the name of the character rather than its codepoint, to do the comparison? It seems clearer and more precise to speak of Unicode codepoint comparisons. -- Chris mailto:chris@w3.org
Received on Monday, 2 December 2002 13:02:09 UTC