Re: Polyglot Markup/XML encoding declaration

On 2010-07-28 19:17, Leif Halvard Silli wrote:
> Lachlan Hunt, Tue, 27 Jul 2010 13:21:03 +0200:
>> On 2010-07-23 16:26, Leif Halvard Silli wrote:
>>> Proposal: Polyglot Markup should allow the document encoding to be set
>>> via the encoding attribute of the XML declaration. The XML declaration,
>>> including the encoding attribute, thus becomes a HTML5 extension,
>>> whenever polyglot markup is being consumed as HTML. (See my previous
>>> letter to Sam, about the XML declaration as polyglot markup indicator.)
>>
>> I object to this because permitting the XML declaration would only
>> serve to pollute the document with unnecessary markup,
>
> A polyglot may be served as XHTML. XML 1.0 does not consider the XML
> declaration unnecessary pollution.

A polyglot may be served as HTML too.  HTML5 does consider the XML 
declaration to be non-conformant, and including it is unnecessary polution.

> There are several things in a polyglot that is unnecessary from a
> purist HTML point of view!

The XML-inspired talismans in the HTML syntax are only permitted to the 
extent that they are required for XHTML compatibility.  An XML 
declaration is not required in XML when the encoding is UTF-8 or UTF-16, 
nor when the encoding is declared externally, and so there is no 
requirement to permit it in HTML for the purpose of polyglot documents.

>> and to mislead authors about how the encoding of a file is actually
>> determined.
>
> I agree that it was bad of me to hint that a HTML consumed file should
> be able to rely on the XML encoding declaration only.  To remove any
> doubt, I emphasize - stronger - that if the XML encoding declaration is
> used, then the HTML encoding declaration - meta@charset - must also be
> used.

Authors learn by copying and pasting. If they see lots of markup in the 
wild using the XML declaration in HTML and that it appears to declare 
the encoding, they will copy it into their own and not understand that 
it doesn't do what they think.

We've seen this scenario before when XHTML 1.0 started becoming popular 
and lots of documents were unnecessarily copying the XML declaration 
from each other, with many people falsely thinking that it either meant 
XML parsing would be used by browsers that supported it or that it 
declared the encoding.  The practice only died out after people started 
realising it triggered quirks mode in IE6.  We have no reason to start 
introducing it again.

>> There have been many observed instances of otherwise useless markup
>> being used by misled authors in ways that don't actually do anything.
>> Many of these cases have now been made optional or obsolete in HTML5
>> because of the wasted effort they were causing, and so introducing
>> new markup with no real purpose would not be wise.
>
> It is permitted in HTML already, under XHTML 1.0, Appendix C.

Appendix C explicitly states:

   "For compatibility with these types of legacy browsers, you may want
    to avoid using processing instructions and XML declarations"

But Appendix C contains no normative requirements either, and so it 
can't permit or deny anything.  It only provide recommendations.

It is also irrelevant because HTML5 does not permit it because it would 
be parsed as a bogus comment.  Permitting it would thus require 
unnecessarily complicated changes to the parsing requirements for no 
benefit whatsoever.

> The XML declaration would not be generally permitted in HTML - it would
> only be permitted in polyglot markup.

There is no way to make some syntax conforming for polyglot documents 
only.  Such a requirement is unenforceable because the conforming 
polyglot document syntax is and should remain only the intersection of 
HTML and XHTML syntax.

-- 
Lachlan Hunt - Opera Software
http://lachy.id.au/
http://www.opera.com/

Received on Thursday, 29 July 2010 13:30:37 UTC