W3C home > Mailing lists > Public > public-i18n-core@w3.org > January to March 2009

Re: Unicode Normalization

From: Ambrose Li <ambrose.li@gmail.com>
Date: Fri, 6 Feb 2009 17:40:42 -0500
Message-ID: <af2cae770902061440s41d77a70l4766cb49c3cdabf5@mail.gmail.com>
To: Robert J Burns <rob@robburns.com>
Cc: public-i18n-core@w3.org, W3C Style List <www-style@w3.org>

2009/2/6 Robert J Burns <rob@robburns.com>:
> Another singleton example is:
>
> 1) 慈 (U+2F8A6) [non-normalized]
> 2) 慈  (U+6148) [NFC and NFD]
>
> I note the font HiraKakuProN-W3 on my system presents these with slightly
> different glyphs which as i said before should be considered a bug (but like

I disagree here. The whole point of the U+2Fxxx block of
"compatibility ideographs" is to allow one to specify a particular
form when the form actually matters (e.g., when dealing with ancient
texts). I ran into U+2F999 just a week ago. (I have to look through
the charts to pick out the correct character. This had to be
contrasted with U+831D which is the normalized form, and the content
that I had to mark up actually says something to the effect of "U+831D
is probably an erraneous form of U+2F999…". This would make no sense
if the two glyphs show up the same). Therefore the fonts MUST display
the two differently; I would consider it a bug if U+2F999 looks the
same as U+831D.

My personal opinion regarding CJK unification is that it's an
inconsistent mess. But that'd be off-topic here.

> input systems, font makers really have not gotten clear norms about this) At
> least in the case of the name of this character ("CJK COMPATIBILITY
> IDEOGRAPH-2F8A6"), the name provides some indication of discouraged use
> (which may be all an author encounters when using a character input system).
> My feeling is that singletons are an ill-conceived part of NFC and NFD
> normalization (closer to compatibility decompositions than canonical
> decompositions), but that the non-singleton parts of normalization are
> essential to proper text handling (and I don't see how Unicode could have
> avoided or could avoid in the future such non-singleton canonical
> normalization).
>
> Take care,
> Rob
>
> [1]:
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:NFC_Quick_Check=No:]>
> [2]:
> <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:NFC_Quick_Check=Maybe:]>
>



-- 
cheers,
-ambrose

The 'net used to be run by smart people; now many sites are run by
idiots. So SAD... (Sites that do spam filtering on mails sent to the
abuse contact need to be cut off the net...)
Received on Friday, 6 February 2009 22:41:20 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 6 February 2009 22:41:23 GMT