RE: HTML5 and Unicode Normalization Form C

Phillips, Addison, Sun, 29 May 2011 13:54:34 -0700:
>> As for using non-NFC outside attributes, then I don't know if 
>> there are issues which can justify a warning. But according
>> to Unicode technical report 15, then the "W3C Character Model
>> for the World Wide Web [ snip ] and other W3C Specifications
>> (such as XML 1.0 5th Edition) recommend using Normalization
>> Form C for all content." [4]
> The normative bits of Charmod-Norm live at [1]. Items C300 and C301 
> use the RFC 2119 keyword "SHOULD" in requiring that content and 
> specifications be fully-normalized or include-normalized.
> It would be unreasonable, in my opinion, to treat HTML5 as a *new* 
> format, so I think any expectations for adding a normalization 
> requirement to HTML are unrealistic.

However, HTML5 warns against not using UTF-8 because of "unexpected 
results" in form submissions and links of not doing so. It would seem 
in tune with this spirit to, if possible, let HTML5/validators point to 
how to eliminate the problems that can cause unexpected resulted even 
with UTF-8, no?

Btw, it seems to be unclear, from HTML5, whether two @id attributes 
that only differs with regard to their normalization, are to be 
considered uniqe. All HTML5 says is said is that @id attributes must be 
unique, but it is not said what actually makes them unique. [1]

Related to the uniqueness: 
  * On the Mac, when serving a file on the preinstalled Apache2, then 
normalized link values (provided they are not cool IRIs with decomposed 
letters) do target files with non-normalized file names. How come? Is 
it because Apache performs a normalization of the HTTP request? 
  * Inside a document, however (with the exception of Safari on windows 
[2]), then composed and decomposed identifiers are treated by browsers 
as distinct identifiers, though. 

> The I18N Core WG has recently agreed 
> to work on normalization guidelines again. There is (and has ever 
> been) little enthusiasm for working on the Character Model, but 
> having read the normalization document again this weekend, I suspect 
> that Charmod-Norm will probably have to be replaced, rather than just 
> worked around.

Good hear your are looking at it!

> [1]

Leif H Silli

Received on Monday, 30 May 2011 01:44:31 UTC