- From: Dan Connolly <connolly@w3.org>
- Date: 02 Dec 2002 11:44:03 -0600
- To: Chris Lilley <chris@w3.org>
- Cc: www-tag@w3.org, Tim Bray <tbray@textuality.com>
On Mon, 2002-12-02 at 09:14, Chris Lilley wrote: > On Monday, December 2, 2002, 3:42:56 PM, Dan wrote: > > > DC> |In Unicode terminology, this would be properly referred > DC> | to as codepoint-for-codepoint comparison. > > DC> Well, it's only codepoint-for-codepoint after you map > DC> the charcters to codepoints; character-for-character > DC> is just as proper, no? > > No. > > Characters are defined as unicode codepoints. Really? where? I understood codepoints to *correspond* to characters, but not to *be* characters. "Each character in the repertoire is then associated with a (mathematical, abstract) non-negative integer, the code point (also known as a character number or code position). The result, a mapping from the repertoire to the set of non-negative integers, is called a coded character set (CCS)." -- http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-Digital > What byte sequences > these codepoints become in various encodings is orthogonal, but a > given character has a unique unicode codepoint. I don't understand your point; I don't see how byte sequences are relevant. If you have two character sequences, I still think it it's proper to speak of comparing them character-for-character. It's reasonably clear that this gives the same result as mapping the character sequence to a codepoint sequence and then comparing codepoint-for-codepoint, but to speak of comparing character sequences codepoint-for-codepoint is a little sloppy, no? -- Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Monday, 2 December 2002 12:43:53 UTC