W3C home > Mailing lists > Public > public-html@w3.org > August 2010

Re: Polyglot Markup/XML encoding declaration

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Sun, 1 Aug 2010 11:55:28 +0400
To: Lachlan Hunt <lachlan.hunt@lachy.id.au>
Cc: HTMLwg <public-html@w3.org>, Eliot Graff <eliotgra@microsoft.com>, public-i18n-core@w3.org
Message-ID: <20100801115528378898.670247d9@xn--mlform-iua.no>
Lachlan Hunt, Thu, 29 Jul 2010 15:30:02 +0200:
> On 2010-07-28 19:17, Leif Halvard Silli wrote:
>> Lachlan Hunt, Tue, 27 Jul 2010 13:21:03 +0200:
>>> On 2010-07-23 16:26, Leif Halvard Silli wrote:
>>>> Proposal: Polyglot Markup should allow the document encoding to be set
>>>> via the encoding attribute of the XML declaration. The XML declaration,
>>>> including the encoding attribute, thus becomes a HTML5 extension,
>>>> whenever polyglot markup is being consumed as HTML. (See my previous
>>>> letter to Sam, about the XML declaration as polyglot markup indicator.)
>>> 
>>> I object to this because permitting the XML declaration would only
>>> serve to pollute the document with unnecessary markup,
>> 
>> A polyglot may be served as XHTML. XML 1.0 does not consider the XML
>> declaration unnecessary pollution.
> 
> A polyglot may be served as HTML too.  HTML5 does consider the XML 
> declaration to be non-conformant, and including it is unnecessary 
> polution.

This touches the question of whether Polyglot Markup is a specification 
or a authoring guide. The TAG by Tim Berners Lee has suggested that is 
to be a specification. Of course, even as a spec, it does not need to 
include the xml declaration. But if it is a spec, then it could include 
it.

>> There are several things in a polyglot that is unnecessary from a
>> purist HTML point of view!
> 
> The XML-inspired talismans in the HTML syntax are only permitted to 
> the extent that they are required for XHTML compatibility.  An XML 
> declaration is not required in XML when the encoding is UTF-8 or 
> UTF-16, nor when the encoding is declared externally, and so there is 
> no requirement to permit it in HTML for the purpose of polyglot 
> documents.

I don't undestand how you use the word 'requirement'. It is possible to 
define polyglot markup without the XML declaration. It is even possible 
to define a profile that doesn't use meta@charset at all - doing so 
would be to treat XML and HTML equally.

>>> and to mislead authors about how the encoding of a file is actually
>>> determined.
>> 
>> I agree that it was bad of me to hint that a HTML consumed file should
>> be able to rely on the XML encoding declaration only.  To remove any
>> doubt, I emphasize - stronger - that if the XML encoding declaration is
>> used, then the HTML encoding declaration - meta@charset - must also be
>> used.
> 
> Authors learn by copying and pasting. If they see lots of markup in 
> the wild using the XML declaration in HTML and that it appears to 
> declare the encoding, they will copy it into their own and not 
> understand that it doesn't do what they think.

Some will undoubtedly become of the believe that meta@charset defines 
the encoding in XHTML. So you want to forbid it for that reason?

> We've seen this scenario before when XHTML 1.0 started becoming 
> popular and lots of documents were unnecessarily copying the XML 
> declaration from each other, with many people falsely thinking that 
> it either meant XML parsing would be used by browsers that supported 
> it or that it declared the encoding.  The practice only died out 
> after people started realising it triggered quirks mode in IE6.  We 
> have no reason to start introducing it again.

We have several reason to introduce it. But I agree that there are also 
reasons to no introduce it.

>>> There have been many observed instances of otherwise useless markup
>>> being used by misled authors in ways that don't actually do anything.
>>> Many of these cases have now been made optional or obsolete in HTML5
>>> because of the wasted effort they were causing, and so introducing
>>> new markup with no real purpose would not be wise.
>> 
>> It is permitted in HTML already, under XHTML 1.0, Appendix C.
> 
> Appendix C explicitly states:
> 
>   "For compatibility with these types of legacy browsers, you may want
>    to avoid using processing instructions and XML declarations"
> 
> But Appendix C contains no normative requirements either, and so it 
> can't permit or deny anything.  It only provide recommendations.

But the text/html MIME registration is very normative. And it states 
that XHTML 1.0 defines a profile which can be served as text/html. 
Thus, you are wrong. The XML declaration is officially permitted.

> It is also irrelevant because HTML5 does not permit it because it 
> would be parsed as a bogus comment.  Permitting it would thus require 
> unnecessarily complicated changes to the parsing requirements for no 
> benefit whatsoever.

Again, there are benefits - but I hear you state over and over that 
there isn't. 

>> The XML declaration would not be generally permitted in HTML - it would
>> only be permitted in polyglot markup.
> 
> There is no way to make some syntax conforming for polyglot documents 
> only.

Just make a validator which does.

>  Such a requirement is unenforceable because the conforming 
> polyglot document syntax is and should remain only the intersection 
> of HTML and XHTML syntax.
-- 
leif Halvard Silli
Received on Sunday, 1 August 2010 07:56:09 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 1 August 2010 07:56:09 GMT