Re: [CSS21][css3-namespace][css3-page][css3-selectors][css3-content] Unicode Normalization

Yes.

Adding a few facts about NFC, for those not fully acquainted with it. Out of
the 100K Unicode characters:

   - Other than CJK compatibility ideographs, there are (currently) 118
   characters that are always transformed by NFC into other characters.
      -
      http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[[:nfcqc=no:]-[:name=/CJK%20COMPATIBILITY%20IDEOGRAPH/:]]
      - The CJK COMPATIBILITY IDEOGRAPHs are a larger set, and will grow
      over time.
      - There are a further 102 characters that may or may not be
   transformed:
      - http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:nfcqc=maybe
      :]
   - Such transformation may be combining with a previous character, or may
   involve reordering. That is, NFC puts non-spacing characters like *combining
   acute *and *combining ring below* into a canonical order.
   - While theoretically NFD and NFC are equally appropriate, in practice
   NFD is only used internally (the one significant exception I know of is the
   Apple file system) -- NFC is the form recommended for interchange.

Mark


On Mon, Feb 2, 2009 at 07:53, Phillips, Addison <addison@amazon.com> wrote:

> >   Would it be reasonable to also disallow insertion of combining
> > characters via such escapes?
>
> Absolutely not reasonable. Some scripts *require* the use of combining
> marks. NFC does not guarantee that no combining marks appear in the text.
> Applying NFC only means that any combining marks that can be combined with
> their base characters are, in fact, combined.
>
> Addison
>
> Addison Phillips
> Globalization Architect -- Lab126
>
> Internationalization is not a feature.
> It is an architecture.
>
>
>
>

Received on Monday, 2 February 2009 18:01:00 UTC