Re: HTML5 and Unicode Normalization Form C

Michael[tm] Smith, Fri, 27 May 2011 23:42:24 +0900:
> But I think it is useful to have it to
> instead have it emit a warning, and some others I've talked with who are
> more knowledgeable than me about NFC agree, so what I'm going to do is, flip
> the validator code to make it emit a warning instead of an error. Then I'll
> update the W3C backends for the HTML5 facet and re-deploy them, by some
> time early next week (which will also pull in a bunch of new and useful
> changes to the backend that Henri Sivonen recently checked in upstream).

The current behaviour makes the validator hide some issues that perhaps 
are more important than the use of decomposed characters in "content":

1) In case of the following document, then the validator - errouneously 
- does point out the use of decomposed values, but does *not* point out 
that the two @id attributes are aqually equal (because they only differ 
with regard to their normalization), and thus should be considered not 
unique and thus invalid:

  <!DOCTYPE html><title></title><p id="a&#x30a;"><p id="&#xe5;">

 (Related bug: http://www.w3.org/Bugs/Public/show_bug.cgi?id=12839 )

  It seems more important to point out that the two @id-s have the same 
value, than it is to point out that one of them uses decomposed form. 
So, if you decided to have a warning against use of non-NFC, then you 
must take care to not make this "suffocate" the error message that the 
above markup should create.

2) The same can be said about the value of @href - any warning message 
that you choose to show with regard to use of decomposed values as 
such, should not not (as it currently does) cause that IRI 
errors/warnings, are not not displayed.

This is yet another reason to not show a general error for use of 
non-NFC and a reason to be careful that the warning that you plan do 
not cause that other necessary checks are not performed (or not 
reported to the validator user).
-- 
leif halvard silli

Received on Wednesday, 1 June 2011 01:47:18 UTC