- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Mon, 2 Aug 2010 03:13:22 +0200
- To: Maciej Stachowiak <mjs@apple.com>
- Cc: Lachlan Hunt <lachlan.hunt@lachy.id.au>, HTMLwg <public-html@w3.org>, Eliot Graff <eliotgra@microsoft.com>, public-i18n-core@w3.org
Maciej Stachowiak, Sun, 01 Aug 2010 17:05:18 -0700: > On Aug 1, 2010, at 12:55 AM, Leif Halvard Silli wrote: >> Lachlan Hunt, Thu, 29 Jul 2010 15:30:02 +0200: ... >>>> The XML declaration would not be generally permitted in HTML - it would >>>> only be permitted in polyglot markup. >>> >>> There is no way to make some syntax conforming for polyglot documents >>> only. >> >> Just make a validator which does. > > The original premise of the polyglot spec was to describe a type of > document that is valid as both HTML5 and XHTML5, Good point. Polyglot Markup is an intersection not as much of HTML5 and XML as it is an intersection of itself. ;-) HTML5 and XHTMl5. > and works > sufficiently the same both ways. Thus, it does not match the original > goals to have a construct that is valid in polyglot documents, but > invalid in at least one of HTML5 or XHTML5. OK. Perhaps I must file a bug about the XML declaration against HTML5 itself. I'll consider it. It _is_ permitted used in XHTML 1.0 served as text/html. So HTMl5 should say something about it. But again: At least 3 user agents implements encoding "sniffing" by using the encoding attribute of the XML declaration. This fact is not described in HTML5, despite that HTML5 asks vendors to be informed about new methods. Thus HTML5 supporting vendors develop text/html parsers that give higher priority to the XML encoding declaration - which is not described in HTML5 - than they give to UTF-8 pattern matching (which _is_ described in HTML5). When can we expect that vendors asks the editor to update the spec? If you removed support for the XML encoding declaration from your HTML5 text/html parsers, then I would find your resistance against allowing the XML declaration *merely in the syntax* more credible. (Again: both the XML declaration and the meta@charset must be present, according to my idea about this Thus, in conforming, polyglot markup consumed as HTML, the XML encoding declaration would not have any effect on text/html parsers.) > Indeed, Lachlan already pointed this out: > >>> Such a requirement is unenforceable because the conforming >>> polyglot document syntax is and should remain only the intersection >>> of HTML and XHTML syntax. Lachlan, as much as I understood, wanted <meta charset="non-UTF-8-encoding-name"/> to be forbidden in polyglot markup. Whenever <meta charset="*"/> occurs in a XHTML document, then HTML5 currently permits any encoding name as its value - including non-UNICODE encodings. I realize that it becomes a be it quirky, for polyglot markup to only allow an in-document encoding declaration that works in text/html. However, unless HTML5 *itself* states that "UTF-8" is the only possible value of the meta@charset element, whenever it occurs in a XHTML document, then polyglot markup should permit the same encodings that HTML5 permits. (Also se: http://www.w3.org/mid/20100802020048211580.56bc4557@xn--mlform-iua.no ) I can be sympathetic towards Laclan's view about how meta@charset should be used in polyglot markup. But if it is is supposed to be spec inferences, then it must be spec inference. At least as long as the goal is to make someone else not bring in the XML declaration ... A compromise, as much as I see, is that HTML5 itself makes any value other than UTF-8 un-permitted in XHTML5 documents. *Then* I will accept that there would be no need for a in-document XML-compatible way to declare the encoding in polyglot markup. > Also, besides this general point, there is the fact that an XML > declaration will trigger quirks mode in some legacy UAs, thus it is a > bad idea to serve content including an XML declaration as text/html. I know. However, Sam suggested that only UTF-8 should be permitted. This was based on compatibility considerations. To which Henri replied: No, please, let us not base this on user agents, but instead let us base polyglot markup on pure spec inference. If we follow that principle to the end, then we should look at how HTML5-compatible user agents behave, and not take notice of how non-conforming user agents that do not conform. -- leif halvard silli
Received on Monday, 2 August 2010 01:14:02 UTC