W3C home > Mailing lists > Public > public-i18n-core@w3.org > January to March 2009

Re: [CSS21][css3-namespace][css3-page][css3-selectors][css3-content] Unicode Normalization

From: Mark Davis <mark.davis@icu-project.org>
Date: Mon, 2 Feb 2009 10:00:20 -0800
Message-ID: <30b660a20902021000h1c53869u88bcebd904dc3402@mail.gmail.com>
To: "Phillips, Addison" <addison@amazon.com>
Cc: Boris Zbarsky <bzbarsky@mit.edu>, Anne van Kesteren <annevk@opera.com>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>, "www-style@w3.org" <www-style@w3.org>

Adding a few facts about NFC, for those not fully acquainted with it. Out of
the 100K Unicode characters:

   - Other than CJK compatibility ideographs, there are (currently) 118
   characters that are always transformed by NFC into other characters.
      - The CJK COMPATIBILITY IDEOGRAPHs are a larger set, and will grow
      over time.
      - There are a further 102 characters that may or may not be
      - http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:nfcqc=maybe
   - Such transformation may be combining with a previous character, or may
   involve reordering. That is, NFC puts non-spacing characters like *combining
   acute *and *combining ring below* into a canonical order.
   - While theoretically NFD and NFC are equally appropriate, in practice
   NFD is only used internally (the one significant exception I know of is the
   Apple file system) -- NFC is the form recommended for interchange.


On Mon, Feb 2, 2009 at 07:53, Phillips, Addison <addison@amazon.com> wrote:

> >   Would it be reasonable to also disallow insertion of combining
> > characters via such escapes?
> Absolutely not reasonable. Some scripts *require* the use of combining
> marks. NFC does not guarantee that no combining marks appear in the text.
> Applying NFC only means that any combining marks that can be combined with
> their base characters are, in fact, combined.
> Addison
> Addison Phillips
> Globalization Architect -- Lab126
> Internationalization is not a feature.
> It is an architecture.
Received on Monday, 2 February 2009 18:01:01 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:23:04 UTC