W3C home > Mailing lists > Public > www-international@w3.org > April to June 2011

RE: HTML5 and Unicode Normalization Form C

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Tue, 31 May 2011 00:37:44 +0200
To: Koji Ishii <kojiishi@gluesoft.co.jp>
Cc: www-international@w3.org
Message-ID: <20110531003744066599.0c49efd4@xn--mlform-iua.no>
Koji Ishii, Mon, 30 May 2011 13:04:34 -0400:

>> Which scripts could such a thing harm?
> 
> One I know is CJK Compatibility Block (U+F900-FAFF) I wrote before. 
> The other I found on the web is in the picture of this page[1] (text 
> is in Japanese, sorry.) NFC transforms "U+1E0A U+0323" to "U+1E0A 
> U+0307", and you see the upper dot is painted at different position. 
> It must be a bug in Word, and I don't know how bad it is though.
> 
> I discussed the problem with Ken Lunde before. He's aware of the 
> problem and he was thinking how to solve it. So the hope is we might 
> have better solution in future, but right now, we don't have a good 
> tool that solves linking problems without changing glyphs 
> unfortunately.
> 
> [1] http://blog.antenna.co.jp/PDFTool/archives/2006/02/pdf_41.html


That article is about PDF, no? Normalizing problems related to PDFs is 
something I often see: Often, when I cocpy a the letter "å" from some 
PDF document, it turns out that the PDF stored it as de-composed. When 
I paste it into an editor, this might lead to funny problems. Now and 
then I have had to use a tool to convert it to NFC. 

I don't know if this is because PDF prefers de-composed letters, or 
what it is.

Unfortunaly, I don't 100% understand the issues that you take up in 
your web page. But it seems from Michel's comment that it is also a 
font issue. It is a very real problem that there are many fonts that do 
not handle combining diacritica very well.
-- 
Leif H Silli
Received on Monday, 30 May 2011 22:38:12 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 May 2011 22:38:15 GMT