- From: Ambrose Li <ambrose.li@gmail.com>
- Date: Fri, 6 Feb 2009 17:40:42 -0500
- To: Robert J Burns <rob@robburns.com>
- Cc: public-i18n-core@w3.org, W3C Style List <www-style@w3.org>
2009/2/6 Robert J Burns <rob@robburns.com>: > Another singleton example is: > > 1) 慈 (U+2F8A6) [non-normalized] > 2) 慈 (U+6148) [NFC and NFD] > > I note the font HiraKakuProN-W3 on my system presents these with slightly > different glyphs which as i said before should be considered a bug (but like I disagree here. The whole point of the U+2Fxxx block of "compatibility ideographs" is to allow one to specify a particular form when the form actually matters (e.g., when dealing with ancient texts). I ran into U+2F999 just a week ago. (I have to look through the charts to pick out the correct character. This had to be contrasted with U+831D which is the normalized form, and the content that I had to mark up actually says something to the effect of "U+831D is probably an erraneous form of U+2F999…". This would make no sense if the two glyphs show up the same). Therefore the fonts MUST display the two differently; I would consider it a bug if U+2F999 looks the same as U+831D. My personal opinion regarding CJK unification is that it's an inconsistent mess. But that'd be off-topic here. > input systems, font makers really have not gotten clear norms about this) At > least in the case of the name of this character ("CJK COMPATIBILITY > IDEOGRAPH-2F8A6"), the name provides some indication of discouraged use > (which may be all an author encounters when using a character input system). > My feeling is that singletons are an ill-conceived part of NFC and NFD > normalization (closer to compatibility decompositions than canonical > decompositions), but that the non-singleton parts of normalization are > essential to proper text handling (and I don't see how Unicode could have > avoided or could avoid in the future such non-singleton canonical > normalization). > > Take care, > Rob > > [1]: > <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:NFC_Quick_Check=No:]> > [2]: > <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:NFC_Quick_Check=Maybe:]> > -- cheers, -ambrose The 'net used to be run by smart people; now many sites are run by idiots. So SAD... (Sites that do spam filtering on mails sent to the abuse contact need to be cut off the net...)
Received on Friday, 6 February 2009 22:41:18 UTC