W3C home > Mailing lists > Public > www-international@w3.org > April to June 2011

RE: HTML5 and Unicode Normalization Form C

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Mon, 30 May 2011 02:06:19 +0200
To: www-validator@w3.org
Cc: www-international@w3.org
Message-ID: <20110530020619137953.690c90be@xn--mlform-iua.no>
Koji Ishii, Sun, 29 May 2011 15:15:24 -0400:
> I agree that NFC/NFD against strings to be compared helps a lot. URI 
> and idref are good examples of such strings.
  [ snip ]
> Unless Unicode resolves issues where NFC/NFD changes some glyphs, I 
> believe that NFC/NFD are like ignore-case; they're good to compare 
> strings, but you don't want to lowercase whole contents.

So, is your proposal that validators should warn against non-NFC in 
links and identifiers, but else not?

Clearly, HTML5 and the HTML5 validator should help authors avoid 
gotchas. But, when thinking trouch some scenarios, it seems to be 
difficult to give the right kind of warning/advice in a validator.

Example: 

* For the Apache2 version that comes with Mac OS X, one might in 
principle use composed as well as decomposed links even if the file 
names are decomposed. In Apache on Mac OS X, there is, however, a 
single problem: cool, composed IRIs. E.g. 
	<http://example.com/%C3%A5.html> works, while 
	<http://example.com/%C3%A5> does not work. May be this is an Apache 
bug.
* In order to fix the above problem, which also lead customers to react 
when files were placed online, I started to use decomposed links:
    <http://example.com/a%CC%8A>

To say that I SHOULD use ad composed link rather than a decomposed link 
in that situation, perhaps would not be vice. OTOH, if the 
warning/advice was phrased as advice to configure my set-up so that I 
could use NFC in the links, could have been productive. Not least could 
it be productive with regard to tool vendors.

> My best preference is web servers to apply NFC/NFD as it receives URL 
> from browsers just like they do ignore-case, but if it's too 
> difficult for some reasons, I can live with applying to attributes of 
> specific data types. I don't think applying NFC/NFD to whole contents 
> is the right way to go.

The version of Apache2 that is installed on Mac OS X seemes better fit 
to handle "these issues" better than for instance my Linux based ISP's 
Apache2 installation does. [Though, I should perform a recheck of the 
linux install.] So, it seems as if Web servers is a very relevant thing 
to improve, when it comes to the Mac OS X issues.

[1] 
http://www.w3.org/International/questions/qa-html-css-normalization#n11nhow
-- 
Leif H Silli
Received on Monday, 30 May 2011 00:06:52 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 May 2011 00:06:54 GMT