- From: Felix Sasaki <fsasaki@w3.org>
- Date: Wed, 25 Jan 2006 14:36:24 +0900
- To: "public-i18n-core@w3.org" <public-i18n-core@w3.org>
There is an issue with the Unicode normalization forms, see http://www.unicode.org/review/pr-29.html . The definitions of NFC and NFKC as they stand contain a contradiction. There are some cases where a transformation toNFC(toNFC(x)) has a different result than toNFC(x). To fix these cases, there is the proposal to change D2. In any character sequence beginning with a starter S, a character C is blocked from S if and only if there is some character B between S and C, and either B is a starter or it has the same combining class as C. to D2'. In any character sequence beginning with a starter S, a character C is blocked from S if and only if there is some character B between S and C, and either B is a starter or it has the same or higher combining class as C. This definition is only to be applied to strings that are already canonically decomposed. When B blocks C, changing the order of B and C would result in a character sequence that is not canonically equivalent to the original. See Section 3.11, Canonical Ordering Behavior in the Unicode Standard, 4.0. The report says that this will not have an impact on real data found in practice (with the possible exception of test cases for the algorithm itself), because the affected sequences do not constitute well-formed text in any language. If you have any comments on this, please send them in until 30 January at http://www.unicode.org/reporting.html . Regards, Felix
Received on Wednesday, 25 January 2006 05:36:31 UTC