Yes.
Adding a few facts about NFC, for those not fully acquainted with it. Out of
the 100K Unicode characters:
- Other than CJK compatibility ideographs, there are (currently) 118
characters that are always transformed by NFC into other characters.
-
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[[:nfcqc=no:]-[:name=/CJK%20COMPATIBILITY%20IDEOGRAPH/:]]
- The CJK COMPATIBILITY IDEOGRAPHs are a larger set, and will grow
over time.
- There are a further 102 characters that may or may not be
transformed:
- http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:nfcqc=maybe
:]
- Such transformation may be combining with a previous character, or may
involve reordering. That is, NFC puts non-spacing characters like *combining
acute *and *combining ring below* into a canonical order.
- While theoretically NFD and NFC are equally appropriate, in practice
NFD is only used internally (the one significant exception I know of is the
Apple file system) -- NFC is the form recommended for interchange.
Mark
On Mon, Feb 2, 2009 at 07:53, Phillips, Addison <addison@amazon.com> wrote:
> > Would it be reasonable to also disallow insertion of combining
> > characters via such escapes?
>
> Absolutely not reasonable. Some scripts *require* the use of combining
> marks. NFC does not guarantee that no combining marks appear in the text.
> Applying NFC only means that any combining marks that can be combined with
> their base characters are, in fact, combined.
>
> Addison
>
> Addison Phillips
> Globalization Architect -- Lab126
>
> Internationalization is not a feature.
> It is an architecture.
>
>
>
>