Re: Posted draft of URI comparison finding from Dan Connolly on 2002-12-02 (www-tag@w3.org from December 2002)

From: Dan Connolly <connolly@w3.org>
Date: 02 Dec 2002 13:56:35 -0600
To: Tim Bray <tbray@textuality.com>
Cc: Chris Lilley <chris@w3.org>, www-tag@w3.org
Message-Id: <1038858996.5320.11297.camel@dirk>

On Mon, 2002-12-02 at 11:51, Tim Bray wrote:
> Dan Connolly wrote:
> 
> > If you have two character sequences, I still think it
> > it's proper to speak of comparing them
> > character-for-character. It's reasonably clear
> > that this gives the same result as mapping
> > the character sequence to a codepoint sequence
> > and then comparing codepoint-for-codepoint,
> > but to speak of comparing character sequences
> > codepoint-for-codepoint is a little sloppy, no?
> 
> I believe the opposite.  A "character" is a complex bundle of visual and 
> linguistic semantics.  A codepoint is a number.  I know how to compare 
> numbers. -Tim

OK, so compare numbers, and derive conclusions about
corresponding characters.

But don't speak of comparing character sequences
"code point for code point". Rather, speak of
mapping character sequences to codepoint sequences
and then comparing the codepoint sequences,
and then coming to conclusions about the
character sequences.

If you're going to be sloppy enough to speak
of comparing character strings "code point
for code point", you might as well speak
of comparing them byte for byte; in both
cases, there's a mapping to characters
that is assumed.

i.e. given character strings s1 and s2,
there's not much difference between exploiting
this correspondence
	toUnicodeCodePoints(s1) = toUnicodeCodePoints(s2) iff s1 = s2
than this one
	toUTF8(s1) = toUTF8(s2) iff s1 = s2.

of course, it's not the case that
	toUTF8(s1) = to8859-1(s2) iff s1 = s2
but nor is it the case that
	toUnicodeCodePoints(s1) = toMyCodePoints(s2) iff s1 = s2

-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/

Received on Monday, 2 December 2002 14:56:23 UTC