RE: HTML5 and Unicode Normalization Form C from Michel Suignard on 2011-05-30 (www-international@w3.org from April to June 2011)

From: Michel Suignard <michel@suignard.com>
Date: Mon, 30 May 2011 11:53:53 -0700
To: Koji Ishii <kojiishi@gluesoft.co.jp>
CC: "www-international@w3.org" <www-international@w3.org>, Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Message-ID: <B3A7D08510A8AF499FFBF528EA193AD702AF89CF49@PE2800.suignard.local>

> One I know is CJK Compatibility Block (U+F900-FAFF) I wrote before. The other I found on the web is in the picture of this page[1] (text is in Japanese, sorry.) NFC transforms "U+1E0A U+0323" to "U+1E0A U+0307", and you see the upper dot is painted at different position. It must be a bug in Word, and I don't know how bad it is though.

Please be careful, it is transformed from "U+1E0A U+0323" to "U+1E0C U+0307", (not "U+1E0A U+0307"). A very different transform.

The rendering issue has nothing to do with Word. It just depends on how the font render either sequence which may be slightly different. On a good Latin font they should be rendered the same. On my machine, Win7 with Office 10, the rendering looks identical with Arial and Times New Roman which are designed to work well with Latin combining marks, not as well on Calibri which is not designed that way.

The CJK Compatibility Block is altogether a different issue resulting from earlier design to make them canonical equivalent to their unified equivalent which created the issue later when normalization was introduced. 
It begs to find another way to encode them which is what probably Ken is alluding to.

Michel

Received on Monday, 30 May 2011 19:07:58 UTC