W3C home > Mailing lists > Public > public-html@w3.org > July 2010

XML declaration as Polyglot Markup indicator

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Fri, 23 Jul 2010 17:26:37 +0300
To: Sam Ruby <rubys@intertwingly.net>
Cc: Eliot Graff <eliotgra@microsoft.com>, public-i18n-core@w3.org, HTMLwg <public-html@w3.org>
Message-ID: <20100723172637853892.ba6e652c@xn--mlform-iua.no>
Sam, 

It is my impression that you do not attempt full IE6 compatibility for 
your web site ...

Issue:  you have many times suggested using xmlns - the xhtml namespace 
declaration on the <html> start tag - as a polyglot markup indicator. 
But group members, including Maciej,  were sceptical about making it 
illegal in non-polyglot HTML5. 

This is really a catch-22: it logically has to be forbidden in ordinary 
HTML in order to serve as a polyglot indicator. And this catch-22 also 
justifies that the indicator should be an extension to HTML5.

Thus, why reinvent the wheel? Let us use the XML declaration for this 
purpose. I plan to file a bug as possible. Since the XML declaration is 
not permitted in HTML5 proper, Polyglot Markup served as HTML thus 
becomes an HTML5 extension, in that single point.

In support of this direction, I also point to Henri, who recently 
complained that the XML declaration isn't obligatory in XML files. [1]  
We could thus have made the XML declaration a MUST for polyglot markup 
- after all, there is no other way to automatically tell a validator 
that the file is a polyglot. However, in the spirit of relying upon 
spec inference, it seems better to apply the same rule as in XML 1.0: 
make it a SHOULD. In XML 1.0, omitting the declaration is also linked 
to use of non-UTF-8/non-UTF-16 encodings. And thus, like in XML 1.0, 
omitting the declaration eventually becomes a carrot for using 
UTF-8/UTF-16.

Another fact that speaks to the advantage of this solution is that 
text/html parsers (at least Webkit/Opera/Gecko) actually (and much to 
my surprise) _do_ take note of the encoding information inside the XML 
declaration's encoding attribute, despite that HTML5's encoding 
determination algorithm does not mention this attribute. 
(Opera/Safari/Firefox give higher priority to the encoding information 
inside the XML declaration, than they give to e.g. UTF-8 detection 
based on pattern matching.) See my next message for more on the XML 
declaration encoding attribute.

It seems justifiable to demand that just as much as the XML domain 
should allow the META @charset element, despite that it has no effect 
there, the text/html domain should also accept that polyglot markup 
extends HTML5 with the XML declaration. There should be some evenness. 
All the more does it seems justifiable since HTML parsers actually make 
use of the XML method anyhow.

	Problematic/Debatable issues:

	DOM identity:  I was unable to check in Live DOM Viewer right now, but 
the in-browser inspectors I used, did not make the XML declaration 
visible in the DOM. Thus the XML declaration should not significantly 
increase the DOM differences between XML- & HTML-parsing.

	UA compatibility: The XML declaration is often warned against in 
authoring guides. The trouble with it today, is principally limited to 
being a quirks mode trigger in IE6. If the author stands on his heads, 
the XML declaration may trigger quirks mode in IE7 and IE7 also: It 
requires that the first character after the string "<?xml" is a 
line-break. However, never do that ... We could eventually warn against 
adding a line-break there.

Besides, I think we should, like Henri said, focus on spec inference, 
rather than UA compatibility investigation. The XML 
declaration is long since out of the sack when it comes to text/html. 
When authors needs a certain UA compatibility, they can omit the XML 
declaration, and use UTF-8. (On my old Windows 98 system with Internet 
Explorer 6, it seems like UTF-8 is the only way to offer a multilingual 
text anyway.) And/Or they can rely on external encoding info (HTTP).

[1] http://www.w3.org/mid/DFA6720A-2D87-46E5-A0F4-BDACA49448B3@iki.fi
-- 
leif halvard silli
Received on Friday, 23 July 2010 14:38:33 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:17:10 GMT