Re: Unicode Normalization

On Wed, Feb 4, 2009 at 1:54 PM, Brad Kemper <brad.kemper@gmail.com> wrote:

>
>
>
> Sent from my iPhone
>
> On Feb 4, 2009, at 1:07 PM, Robert J Burns <rob@robburns.com> wrote:
>
>  However Unicode has a SHOULD requirement that two canonically equivalent
>> but codepoint differing strings match. Unicode's Chapter 3 (C6 norm) says:
>>
>>>
>>>>
>>>
>>>
>>  A process shall not assume that the interpretations of two
>>> canonical-equivalent character sequences are distinct.
>>>
>>
> Your interpretation adds something that your quoted text does not include.
> The quoted text does not include "but code point differing". It seems quite
> clear (at least when read in isolation from the rest of the spec) that its
> simply saying that two canonical-equivalent character sequences MAY not be
> distinct. If they are are not code point differing then they wouldn't be
> distinct. Otherwise they would be.
>

Brad:

The whole point in Unicode of "canonical-equivalent" instead of "identical"
is that the code points may differ (in order and/or values). I don't think
that anyone seriously involved with Unicode could agree with your
interpretation.

(Tangentially, what would be the point of even having the sentence if your
interpretation were correct?)

As somebody who has worked with text, fonts and Unicode for a long time now,
I'm inclined to agree with the general sentiment that there is a problem
here, and it ought to be solved. I'm agnostic as to where in the workflow
the solution should live, as I think that decision involves expertise I
lack.

Cheers,

T

Received on Wednesday, 4 February 2009 22:07:40 UTC