- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Sun, 29 May 2011 19:21:07 +0200
- To: www-validator@w3.org
- Cc: www-international@w3.org
Andreas Prilop, Fri, 27 May 2011 17:33:35 +0200 (CEST): > On Fri, 27 May 2011, Michael[tm] Smith wrote: >> But if you think it's wrong to even have it emit a warning, >> then let me know and I talk to Henri and to the internationalization >> folks about whether it should be or not. But from what I have been >> told by the internationalization folks so far, I think they would >> like to for it to be generating a warning here. > In my opinion, you should not even emit a warning since Unicode > itself does not require NFC to be used everywhere. > It is the choice of the author to take any character encoding > and any valid Unicode representation. This has nothing to do > with "valid HTML" and should therefore not be reported by > an HTML validator. Actually, as discussed on www-international in February, use of non-NFC is is likely to be a surprising and hard to debug result of interaction with a tool or a file system which do not use/convert to NFC, rather than a conscious choice. [1] Use of non-NFC in file names is a problem in itself: unless the URL uses the the same (de)composition, the file name and the link doesn't match. And even when e.g. a link and a file name both uses non-NFC, there might be interaction problems related to CSS in some user agents. (:visited and :link styling). HTML5 already warns against use of non-UTF8 with the justification that it can problems, quote: [2] "form submission and URL encodings". And hence, because non-NFC could cause the same kind of problems, a warning for use of non-NFC in links and idrefs does seem in place. This seems worthy to mention in HTML5 iself - perhaps a bug should be filed. I don't know if CSS selectors are affected - if so, then any attribute value wiht a non-NFC value should potentially have a warning. CSS namespaces is perhpas another problem area - which falls in under CSS, though. [3] As for using non-NFC outside attributes, then I don't know if there are issues which can justify a warning. But according to Unicode technical report 15, then the "W3C Character Model for the World Wide Web [ snip ] and other W3C Specifications (such as XML 1.0 5th Edition) recommend using Normalization Form C for all content." [4] [1] http://lists.w3.org/Archives/Public/www-international/2011JanMar/0046 [2] http://www.w3.org/TR/html5/semantics.html#charset [3] http://lists.w3.org/Archives/Public/www-style/2011May/0076 [4] http://unicode.org/reports/tr15/ -- Leif Halvard Silli
Received on Sunday, 29 May 2011 17:22:37 UTC