RE: HTML5 and Unicode Normalization Form C

Thank you Michel and Leif, yeah, I confirmed that it was a font issue too. I'm sorry for writing information without enough verification. The author was seeing an issue in his own PDF tool and saw it reproduces on Word 2003 (way too old.)

So, spec-wise, the real issue is only in CJK Compatibility Block as far as I know. Other issues are implementation bugs in fonts. I'm sorry for Microsoft.


-----Original Message-----
From: Leif Halvard Silli [mailto:xn--mlform-iua@må] 
Sent: Tuesday, May 31, 2011 7:38 AM
To: Koji Ishii
Subject: RE: HTML5 and Unicode Normalization Form C

Koji Ishii, Mon, 30 May 2011 13:04:34 -0400:

>> Which scripts could such a thing harm?
> One I know is CJK Compatibility Block (U+F900-FAFF) I wrote before. 
> The other I found on the web is in the picture of this page[1] (text 
> is in Japanese, sorry.) NFC transforms "U+1E0A U+0323" to "U+1E0A 
> U+0307", and you see the upper dot is painted at different position. 
> It must be a bug in Word, and I don't know how bad it is though.
> I discussed the problem with Ken Lunde before. He's aware of the 
> problem and he was thinking how to solve it. So the hope is we might 
> have better solution in future, but right now, we don't have a good 
> tool that solves linking problems without changing glyphs 
> unfortunately.
> [1]

That article is about PDF, no? Normalizing problems related to PDFs is 
something I often see: Often, when I cocpy a the letter "å" from some 
PDF document, it turns out that the PDF stored it as de-composed. When 
I paste it into an editor, this might lead to funny problems. Now and 
then I have had to use a tool to convert it to NFC. 

I don't know if this is because PDF prefers de-composed letters, or 
what it is.

Unfortunaly, I don't 100% understand the issues that you take up in 
your web page. But it seems from Michel's comment that it is also a 
font issue. It is a very real problem that there are many fonts that do 
not handle combining diacritica very well.
Leif H Silli

Received on Tuesday, 31 May 2011 12:42:23 UTC