W3C home > Mailing lists > Public > public-i18n-core@w3.org > July to September 2010

Re: Polyglot Markup/XML encoding declaration

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Mon, 2 Aug 2010 03:13:22 +0200
To: Maciej Stachowiak <mjs@apple.com>
Cc: Lachlan Hunt <lachlan.hunt@lachy.id.au>, HTMLwg <public-html@w3.org>, Eliot Graff <eliotgra@microsoft.com>, public-i18n-core@w3.org
Message-ID: <20100802031322613053.92ee7e82@xn--mlform-iua.no>
Maciej Stachowiak, Sun, 01 Aug 2010 17:05:18 -0700:
> On Aug 1, 2010, at 12:55 AM, Leif Halvard Silli wrote:
>> Lachlan Hunt, Thu, 29 Jul 2010 15:30:02 +0200:
  ...
>>>> The XML declaration would not be generally permitted in HTML - it would
>>>> only be permitted in polyglot markup.
>>> 
>>> There is no way to make some syntax conforming for polyglot documents 
>>> only.
>> 
>> Just make a validator which does.
> 
> The original premise of the polyglot spec was to describe a type of 
> document that is valid as both HTML5 and XHTML5, 

Good point. Polyglot Markup is an intersection not as much of HTML5 and 
XML as it is an intersection of itself. ;-) HTML5 and XHTMl5.

> and works 
> sufficiently the same both ways. Thus, it does not match the original 
> goals to have a construct that is valid in polyglot documents, but 
> invalid in at least one of HTML5 or XHTML5.

OK. Perhaps I must file a bug about the XML declaration against HTML5 
itself. I'll consider it. It _is_ permitted used in XHTML 1.0 served as 
text/html. So HTMl5 should say something about it.

But again: At least 3 user agents implements encoding "sniffing" by 
using the encoding attribute of the XML declaration. This fact is not 
described in HTML5, despite that HTML5 asks vendors to be informed 
about new methods. Thus HTML5 supporting vendors develop text/html 
parsers that give higher priority to the XML encoding declaration - 
which is not described in HTML5 - than they give to UTF-8 pattern 
matching (which _is_ described in HTML5).

When can we expect that vendors asks the editor to update the spec?

If you removed support for the XML encoding declaration from your HTML5 
text/html parsers, then I would find your resistance against allowing 
the XML declaration *merely in the syntax* more credible. (Again: both 
the XML declaration and the meta@charset must be present, according to 
my idea about this Thus, in conforming, polyglot markup consumed as 
HTML, the XML encoding declaration would not have any effect on 
text/html parsers.)

> Indeed, Lachlan already  pointed this out:
> 
>>> Such a requirement is unenforceable because the conforming 
>>> polyglot document syntax is and should remain only the intersection 
>>> of HTML and XHTML syntax.

Lachlan, as much as I understood, wanted <meta 
charset="non-UTF-8-encoding-name"/> to be forbidden in polyglot markup. 

Whenever <meta charset="*"/> occurs in a XHTML document, then HTML5 
currently permits any encoding name as its value - including 
non-UNICODE encodings. I realize that it becomes a be it quirky, for 
polyglot markup to only allow an in-document encoding declaration that 
works in text/html. However, unless HTML5 *itself* states that "UTF-8" 
is the only possible value of the meta@charset element, whenever it 
occurs in a XHTML document, then polyglot markup should permit the same 
encodings that HTML5 permits. (Also se: 
http://www.w3.org/mid/20100802020048211580.56bc4557@xn--mlform-iua.no )

I can be sympathetic towards Laclan's view about how meta@charset 
should be used in polyglot markup. But if it is is supposed to be spec 
inferences, then it must be spec inference. At least as long as the 
goal is to make someone else not bring in the XML declaration ...

A compromise, as much as I see, is that HTML5 itself makes any value 
other than UTF-8 un-permitted in XHTML5 documents. *Then* I will accept 
that there would be no need for a in-document XML-compatible way to 
declare the encoding in polyglot markup.

> Also, besides this general point, there is the fact that an XML 
> declaration will trigger quirks mode in some legacy UAs, thus it is a 
> bad idea to serve content including an XML declaration as text/html.

I know. However, Sam suggested that only UTF-8 should be permitted. 
This was based on compatibility considerations. To which Henri replied: 
No, please, let us not base this on user agents, but instead let us 
base polyglot markup on pure spec inference. If we follow that 
principle to the end, then we should look at how HTML5-compatible user 
agents behave, and not take notice of how non-conforming user agents 
that do not conform.
-- 
leif halvard silli
Received on Monday, 2 August 2010 01:14:00 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 2 August 2010 01:14:01 GMT