- From: Mark Davis <mark.davis@icu-project.org>
- Date: Wed, 25 Jan 2006 09:08:19 -0800
- To: Felix Sasaki <fsasaki@w3.org>
- CC: "public-i18n-core@w3.org" <public-i18n-core@w3.org>
There is a misunderstanding. The change to D2 was already made in Unicode 4.1, released about a year ago -- and (see http://www.unicode.org/reports/tr15/#D2). Only the material marked in yellow is new for this version. Mark Felix Sasaki wrote: > > There is an issue with the Unicode normalization forms, see > http://www.unicode.org/review/pr-29.html . > > The definitions of NFC and NFKC as they stand contain a > contradiction. There are some cases where a transformation > toNFC(toNFC(x)) has a different result than toNFC(x). > > To fix these cases, there is the proposal to change > > D2. In any character sequence beginning with a starter S, a character > C is blocked from S if and only if there is some character B between > S and C, and either B is a starter or it has the same combining class > as C. > > to > > D2'. In any character sequence beginning with a starter S, a character > C is blocked from S if and only if there is some character B between > S and C, and either B is a starter or it has the same or higher > combining class as C. > > This definition is only to be applied to strings that are already > canonically decomposed. > > When B blocks C, changing the order of B and C would result in a > character sequence that is not canonically equivalent to the > original. See Section 3.11, Canonical Ordering Behavior in the > Unicode Standard, 4.0. > > The report says that this will not have an impact on real data found > in practice (with the possible exception of test cases for the > algorithm itself), because the affected sequences do not constitute > well-formed text in any language. > > If you have any comments on this, please send them in until 30 January > at http://www.unicode.org/reporting.html . > > Regards, Felix > > > >
Received on Wednesday, 25 January 2006 17:15:09 UTC