- From: Dan Connolly <connolly@w3.org>
- Date: 02 Dec 2002 13:56:35 -0600
- To: Tim Bray <tbray@textuality.com>
- Cc: Chris Lilley <chris@w3.org>, www-tag@w3.org
On Mon, 2002-12-02 at 11:51, Tim Bray wrote: > Dan Connolly wrote: > > > If you have two character sequences, I still think it > > it's proper to speak of comparing them > > character-for-character. It's reasonably clear > > that this gives the same result as mapping > > the character sequence to a codepoint sequence > > and then comparing codepoint-for-codepoint, > > but to speak of comparing character sequences > > codepoint-for-codepoint is a little sloppy, no? > > I believe the opposite. A "character" is a complex bundle of visual and > linguistic semantics. A codepoint is a number. I know how to compare > numbers. -Tim OK, so compare numbers, and derive conclusions about corresponding characters. But don't speak of comparing character sequences "code point for code point". Rather, speak of mapping character sequences to codepoint sequences and then comparing the codepoint sequences, and then coming to conclusions about the character sequences. If you're going to be sloppy enough to speak of comparing character strings "code point for code point", you might as well speak of comparing them byte for byte; in both cases, there's a mapping to characters that is assumed. i.e. given character strings s1 and s2, there's not much difference between exploiting this correspondence toUnicodeCodePoints(s1) = toUnicodeCodePoints(s2) iff s1 = s2 than this one toUTF8(s1) = toUTF8(s2) iff s1 = s2. of course, it's not the case that toUTF8(s1) = to8859-1(s2) iff s1 = s2 but nor is it the case that toUnicodeCodePoints(s1) = toMyCodePoints(s2) iff s1 = s2 -- Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Monday, 2 December 2002 14:56:23 UTC