RE: HTML5 and Unicode Normalization Form C

Koji Ishii, Mon, 30 May 2011 13:04:34 -0400:

>> Which scripts could such a thing harm?
> 
> One I know is CJK Compatibility Block (U+F900-FAFF) I wrote before. 
> The other I found on the web is in the picture of this page[1] (text 
> is in Japanese, sorry.) NFC transforms "U+1E0A U+0323" to "U+1E0A 
> U+0307", and you see the upper dot is painted at different position. 
> It must be a bug in Word, and I don't know how bad it is though.
> 
> I discussed the problem with Ken Lunde before. He's aware of the 
> problem and he was thinking how to solve it. So the hope is we might 
> have better solution in future, but right now, we don't have a good 
> tool that solves linking problems without changing glyphs 
> unfortunately.
> 
> [1] http://blog.antenna.co.jp/PDFTool/archives/2006/02/pdf_41.html


That article is about PDF, no? Normalizing problems related to PDFs is 
something I often see: Often, when I cocpy a the letter "å" from some 
PDF document, it turns out that the PDF stored it as de-composed. When 
I paste it into an editor, this might lead to funny problems. Now and 
then I have had to use a tool to convert it to NFC. 

I don't know if this is because PDF prefers de-composed letters, or 
what it is.

Unfortunaly, I don't 100% understand the issues that you take up in 
your web page. But it seems from Michel's comment that it is also a 
font issue. It is a very real problem that there are many fonts that do 
not handle combining diacritica very well.
-- 
Leif H Silli

Received on Monday, 30 May 2011 22:38:12 UTC