W3C home > Mailing lists > Public > www-international@w3.org > April to June 2011

RE: HTML5 and Unicode Normalization Form C

From: Koji Ishii <kojiishi@gluesoft.co.jp>
Date: Mon, 30 May 2011 13:04:34 -0400
To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
CC: "www-international@w3.org" <www-international@w3.org>
Message-ID: <A592E245B36A8949BDB0A302B375FB4E0AC29AEA24@MAILR001.mail.lan>
Thank you for the understanding and I still feel sorry for my English skills. I've been wishing to learn better but it never happened. Sigh.

> Which scripts could such a thing harm?

One I know is CJK Compatibility Block (U+F900-FAFF) I wrote before. The other I found on the web is in the picture of this page[1] (text is in Japanese, sorry.) NFC transforms "U+1E0A U+0323" to "U+1E0A U+0307", and you see the upper dot is painted at different position. It must be a bug in Word, and I don't know how bad it is though.

I discussed the problem with Ken Lunde before. He's aware of the problem and he was thinking how to solve it. So the hope is we might have better solution in future, but right now, we don't have a good tool that solves linking problems without changing glyphs unfortunately.

[1] http://blog.antenna.co.jp/PDFTool/archives/2006/02/pdf_41.html

Regards,
Koji

-----Original Message-----
From: Leif Halvard Silli [mailto:xn--mlform-iua@målform.no] 
Sent: Monday, May 30, 2011 11:16 PM
To: Koji Ishii
Cc: www-international@w3.org
Subject: RE: HTML5 and Unicode Normalization Form C

Koji Ishii, Mon, 30 May 2011 04:21:45 -0400:
>> Koji Ishii, Sun, 29 May 2011 22:10:29 -0400:
>>> It looks like all Leif cares is URL.
>> 
>> All? As in "nothing more than"?
> 
> Ah...I apologize  [ snip ]

No problem. And it is true that my main focus is on linking.

>>> I think it'd make sense for HTML5 spec and validator to follow
>>> URL/IRI spec for attributes that contain URL/IRI.
>> 
>> Do you expect text editors to encode content of attributes differnetly
>> from content of other parts of the text file?
> 
> Yes for validators. URL/IRI has syntax like encoding using "%", so 
> validation of attribute values using its data type makes sense to me. 
> If it wasn't the goal of the HTML5 validator, or if I'm asking too 
> much, I'm sorry for that.

HTML5 supports IRIs, which: [1] "Allows native representation of 
Unicode in resources without % escaping". Or put differently: [2] "the 
desired Web address is stored in a document link or typed into the 
client's address bar using the relevant native characters".

> But you're right that it could be a hard requirement for editors. If 
> we take it seriously, I guess we have to wait Unicode to fix NFC 
> problems (I heard the effort is going on) or to ask web 
> browsers/servers to normalize on the fly. All options we have today 
> have trade-offs, and I just wanted you to be aware of that 
> normalizing whole contents today can harm some scripts.

Which scripts could such a thing harm?

>>> Whether to apply NFC/NFD to whole contents or not seems to be a
>>> little separate issue to me.
>> 
>> This thread started on www-validator@ and did not speak about "whole
>> contents" or not - it only dealt with the fact that the HTML5 validator
>> issued an error for non-NFC content. I have also seen that same error,
>> and I thought - then - that it was based on HTML5.
>> 
>> However, it has to be said that it was only after Andreas Prilop
>> pointed out that the HTML5 validator issues the same error message
>> inside as well as outside attributes, that I understood that it - in
>> contrast to what I thought - was not a restriction that was
>> particularly related to links.
>> 
>> As it has turned out, however, it was an error of the HTML5 validator
>> to show an error for use of NFC. But *that* only increases the
>> importance of offer helpful recommendations w.r.t. links.
> 
> Thank you for the explanation of the background I wasn't aware of.

I should have pointed it out when I CC-ed this list. Sorry.

> I 
> agree that links have problems you raised, and NFC can solve it. All 
> I want you to understand is that applying NFC to displayable contents 
> has some different problems, so what we said do not contradict to 
> each other I think, and I wanted to find a solution that can make 
> both of us happy.

Agree!

[1] 
http://download.microsoft.com/download/a/6/0/a60decbd-9044-42f1-b9c5-1c90c7a5a8ce/a6.pdf
[2] http://www.w3.org/International/articles/idn-and-iri/#idnoverview
-- 
Leif H Silli
Received on Monday, 30 May 2011 17:04:33 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 May 2011 17:04:35 GMT