Re: Unicode issue on normalization from Mark Davis on 2006-01-25 (public-i18n-core@w3.org from January to March 2006)

From: Mark Davis <mark.davis@icu-project.org>
Date: Wed, 25 Jan 2006 09:08:19 -0800
To: Felix Sasaki <fsasaki@w3.org>
CC: "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-ID: <43D7B083.2080104@icu-project.org>

There is a misunderstanding. The change to D2 was already made in 
Unicode 4.1, released about a year ago -- and  (see 
http://www.unicode.org/reports/tr15/#D2). Only the material marked in 
yellow is new for this version.

Mark

Felix Sasaki wrote:

>
> There is an issue with the Unicode normalization forms, see  
> http://www.unicode.org/review/pr-29.html .
>
> The definitions of NFC and NFKC as they stand contain a 
> contradiction.  There are some cases where a transformation 
> toNFC(toNFC(x)) has a  different result than toNFC(x).
>
> To fix these cases, there is the proposal to change
>
> D2. In any character sequence beginning with a starter S, a character 
> C is  blocked from S if and only if there is some character B between 
> S and C,  and either B is a starter or it has the same combining class 
> as C.
>
> to
>
> D2'. In any character sequence beginning with a starter S, a character 
> C  is blocked from S if and only if there is some character B between 
> S and  C, and either B is a starter or it has the same or higher 
> combining class  as C.
>
> This definition is only to be applied to strings that are already  
> canonically decomposed.
>
> When B blocks C, changing the order of B and C would result in a 
> character  sequence that is  not canonically equivalent to the 
> original. See Section  3.11, Canonical Ordering Behavior in the 
> Unicode Standard, 4.0.
>
> The report says that this will not have an impact on real data found 
> in  practice (with the possible exception of test cases for the 
> algorithm  itself), because the affected sequences do not constitute 
> well-formed text  in any language.
>
> If you have any comments on this, please send them in until 30 January 
> at  http://www.unicode.org/reporting.html .
>
> Regards, Felix
>
>
>
>

Received on Wednesday, 25 January 2006 17:15:09 UTC