- From: <bugzilla@jessica.w3.org>
- Date: Sun, 30 Jan 2011 00:42:43 +0000
- To: public-html-bugzilla@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=11909 --- Comment #6 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 2011-01-30 00:42:42 UTC --- (In reply to comment #5) > == 7. Internationalization == > Polyglot Markup should use UTF-8, for such and such reasons: (A) I now believe that the exact, permitted encodings should be a conformance issue. That way we can solve the issue that several - including Sam - seems to be wanting to say that only UTF-8 should/must be used. (Plust that we solve _my_ problem: I think it would be stupid to say that polyglot markup, per definition, needs to be UTF-8. I'm fine with limiting HTML5-compatibel documetns to UTF-8, as long as the rule is founded in something solid.) Thus the principles section should only give general consideration - e.g. it can say that if you sets the encoding with a meta element, then you must also set the encoding with XML declaration, except when the encoding is UTF-8 (and UTF-16). The status of the HTML5 spec is that it permits <meta charset="UTF-8"/> inside the XHTML syntax only when the value of @charset is UTF-8. And it also forbids the use of the use of the XML declaration. Thus, section 2, about HTML5-conformatance, should demand UTF-8. (B) Regarding the general rules: we need to consider that HTML5/HTML parsers have encoding detection algorithm(s). Polyglot Markup must be authored in such a way that HTML5's encoding detection algorithm doesn't run (at least does not more more than to the step wher there is a <meta charset> element in the start of the doc). This rules needs to be in place in order to equalize both the DOMs and the general experience. If the algorithm runs longer than that, then, in HTML5, the page can be redrawn, the encoding change during the actual parsing and so on. I would also like to add a 10th point to the principles: == 10. Authoring equality == Polyglots should be possible to author using both HTML tools and XML tools. And authoring is, in this case, understood as working on a single file - not in a CMS but in a file system. The practical consequence of this is that if you use other encodings than UTF-8/UTF-16 and also if you don't use the BOM, then there *must* be a encoding declarations inside the document. (This in turns, leads us to say that, for HTML5-conforming documents, then only UTF-8 (and perhasp UTF-16 - must think) is permitted. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Sunday, 30 January 2011 00:42:44 UTC