- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Fri, 23 Jul 2010 17:26:37 +0300
- To: Sam Ruby <rubys@intertwingly.net>
- Cc: Eliot Graff <eliotgra@microsoft.com>, public-i18n-core@w3.org, HTMLwg <public-html@w3.org>
Sam, It is my impression that you do not attempt full IE6 compatibility for your web site ... Issue: you have many times suggested using xmlns - the xhtml namespace declaration on the <html> start tag - as a polyglot markup indicator. But group members, including Maciej, were sceptical about making it illegal in non-polyglot HTML5. This is really a catch-22: it logically has to be forbidden in ordinary HTML in order to serve as a polyglot indicator. And this catch-22 also justifies that the indicator should be an extension to HTML5. Thus, why reinvent the wheel? Let us use the XML declaration for this purpose. I plan to file a bug as possible. Since the XML declaration is not permitted in HTML5 proper, Polyglot Markup served as HTML thus becomes an HTML5 extension, in that single point. In support of this direction, I also point to Henri, who recently complained that the XML declaration isn't obligatory in XML files. [1] We could thus have made the XML declaration a MUST for polyglot markup - after all, there is no other way to automatically tell a validator that the file is a polyglot. However, in the spirit of relying upon spec inference, it seems better to apply the same rule as in XML 1.0: make it a SHOULD. In XML 1.0, omitting the declaration is also linked to use of non-UTF-8/non-UTF-16 encodings. And thus, like in XML 1.0, omitting the declaration eventually becomes a carrot for using UTF-8/UTF-16. Another fact that speaks to the advantage of this solution is that text/html parsers (at least Webkit/Opera/Gecko) actually (and much to my surprise) _do_ take note of the encoding information inside the XML declaration's encoding attribute, despite that HTML5's encoding determination algorithm does not mention this attribute. (Opera/Safari/Firefox give higher priority to the encoding information inside the XML declaration, than they give to e.g. UTF-8 detection based on pattern matching.) See my next message for more on the XML declaration encoding attribute. It seems justifiable to demand that just as much as the XML domain should allow the META @charset element, despite that it has no effect there, the text/html domain should also accept that polyglot markup extends HTML5 with the XML declaration. There should be some evenness. All the more does it seems justifiable since HTML parsers actually make use of the XML method anyhow. Problematic/Debatable issues: DOM identity: I was unable to check in Live DOM Viewer right now, but the in-browser inspectors I used, did not make the XML declaration visible in the DOM. Thus the XML declaration should not significantly increase the DOM differences between XML- & HTML-parsing. UA compatibility: The XML declaration is often warned against in authoring guides. The trouble with it today, is principally limited to being a quirks mode trigger in IE6. If the author stands on his heads, the XML declaration may trigger quirks mode in IE7 and IE7 also: It requires that the first character after the string "<?xml" is a line-break. However, never do that ... We could eventually warn against adding a line-break there. Besides, I think we should, like Henri said, focus on spec inference, rather than UA compatibility investigation. The XML declaration is long since out of the sack when it comes to text/html. When authors needs a certain UA compatibility, they can omit the XML declaration, and use UTF-8. (On my old Windows 98 system with Internet Explorer 6, it seems like UTF-8 is the only way to offer a multilingual text anyway.) And/Or they can rely on external encoding info (HTTP). [1] http://www.w3.org/mid/DFA6720A-2D87-46E5-A0F4-BDACA49448B3@iki.fi -- leif halvard silli
Received on Friday, 23 July 2010 14:38:36 UTC