Re: Unicode Normalization

From: Ambrose Li <ambrose.li@gmail.com>
Date: Fri, 6 Feb 2009 17:40:42 -0500
Message-ID: <af2cae770902061440s41d77a70l4766cb49c3cdabf5@mail.gmail.com>
To: Robert J Burns <rob@robburns.com>
Cc: public-i18n-core@w3.org, W3C Style List <www-style@w3.org>

2009/2/6 Robert J Burns <rob@robburns.com>:
> Another singleton example is:
> 1) 慈 (U+2F8A6) [non-normalized]
> 2) 慈  (U+6148) [NFC and NFD]
> I note the font HiraKakuProN-W3 on my system presents these with slightly
> different glyphs which as i said before should be considered a bug (but like

I disagree here. The whole point of the U+2Fxxx block of
"compatibility ideographs" is to allow one to specify a particular
form when the form actually matters (e.g., when dealing with ancient
texts). I ran into U+2F999 just a week ago. (I have to look through
the charts to pick out the correct character. This had to be
contrasted with U+831D which is the normalized form, and the content
that I had to mark up actually says something to the effect of "U+831D
is probably an erraneous form of U+2F999…". This would make no sense
if the two glyphs show up the same). Therefore the fonts MUST display
the two differently; I would consider it a bug if U+2F999 looks the
same as U+831D.

My personal opinion regarding CJK unification is that it's an
inconsistent mess. But that'd be off-topic here.

> input systems, font makers really have not gotten clear norms about this) At
> least in the case of the name of this character ("CJK COMPATIBILITY
> IDEOGRAPH-2F8A6"), the name provides some indication of discouraged use
> (which may be all an author encounters when using a character input system).
> My feeling is that singletons are an ill-conceived part of NFC and NFD
> normalization (closer to compatibility decompositions than canonical
> decompositions), but that the non-singleton parts of normalization are
> essential to proper text handling (and I don't see how Unicode could have
> avoided or could avoid in the future such non-singleton canonical
> normalization).
> Take care,
> Rob
> [1]:
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:NFC_Quick_Check=No:]>
> [2]:
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:NFC_Quick_Check=Maybe:]>


