RE: HTML5 and Unicode Normalization Form C from Koji Ishii on 2011-05-29 (www-validator@w3.org from May 2011)

From: Koji Ishii <kojiishi@gluesoft.co.jp>
Date: Sun, 29 May 2011 15:15:24 -0400
To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>, "www-validator@w3.org" <www-validator@w3.org>
CC: "www-international@w3.org" <www-international@w3.org>
Message-ID: <A592E245B36A8949BDB0A302B375FB4E0AC29AEA16@MAILR001.mail.lan>

I agree that NFC/NFD against strings to be compared helps a lot. URI and idref are good examples of such strings.

However, I'm against applying NFC to displayable contents. If you read XML 1.0 5th Edition carefully, it suggests using NFC only for XML Names[1].

Unless Unicode resolves issues where NFC/NFD changes some glyphs, I believe that NFC/NFD are like ignore-case; they're good to compare strings, but you don't want to lowercase whole contents.

My best preference is web servers to apply NFC/NFD as it receives URL from browsers just like they do ignore-case, but if it's too difficult for some reasons, I can live with applying to attributes of specific data types. I don't think applying NFC/NFD to whole contents is the right way to go.

[1] http://www.w3.org/TR/xml/#sec-suggested-names

Regards,
Koji

Received on Sunday, 29 May 2011 19:15:22 UTC