- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Mon, 2 Aug 2010 02:00:48 +0200
- To: Henri Sivonen <hsivonen@iki.fi>
- Cc: public-html <public-html@w3.org>, Eliot Graff <eliotgra@microsoft.com>, public-i18n-core@w3.org
Henri Sivonen, Thu, 29 Jul 2010 07:19:37 -0700 (PDT): >> Next question: if one can specify any encoding via HTTP, why forbid >> any encoding inside <meta charset='*'/>? > > If the meta prescan finds something, the real encoding has to be a > rough ASCII superset. > > See https://bugzilla.mozilla.org/show_bug.cgi?id=582788 You misunderstood. Whether one should be permitted to specify UTF-16 via meta@charset, is not the problem field at discussion. The dilemma is as follows: What kind of inference could make us draw the conclusion that <meta charset="windows-1251"/> - but not <meta charset="utf-8"/> - should be forbidden in polyglot markup? (Because, it is my impression that you think polyglot markup should permit any encoding - with the limitation that only UTF-8 can be specified as the encoding via meta@charset.) Suggested defense for such a view: meta@charset is not really permitted in XHTML, it is, according to HTML5: [1] "only allowed [in XHTML] in order to facilitate migration to and from XHTML" Comment: We can all see that <meta charset="UTF-8"/> can be useful for such a migration purpose. But how can <meta charset="windows-1251"/> "facilitate migration to and from XHTML"? The answer is: it can't. Not anymore than the presence of <?xml version="1.0" encoding="WINDOWS-1251" ?> can. (But together - if both are present - then they can, when used in tandem, facilitation migration.) Perhaps HTML5 as well should only permit "UTF-8" as the value of <meta charset="*"/>, when present in XHTML? That could solve this dilemma when it comes to Polyglot Markup as well! Please file bug, if you think so. If <meta charset="windows-1251"/> can't facilitate migration, then it can much less be polyglot, one should think ... However, by the letter, then <meta charset="windows-1251"/> is permitted in both XHTML5 and HTMl5. Thus it is polyglot. OK, <meta charset="windows-1251"/> creates some problems - some possibilities for misunderstanding and so on. But it is still polyglot - it is still permitted. >> And then: Why allow any encoding inside <meta charset='*'/> but not >> allow the XML (encoding) declaration? > > Because there's no parser support for what looks like an XML > declaration in text/html. In polyglot markup, then there would be no more need for text/html support for the XML encoding declaration, than there would be need for XML support for the meta@charset element. What I suggest as needed is that polyglot markup permits that both <meta charset="ISO-8859-1"/> as well as <?xml version="1.0" encoding="ISO-8859-1" ?> can be used for specifying the encoding, as long as *both* of them are present in the same document. Because, just as much as it is possible to say that meta@charset is permitted in XHTML "to facilitate migration to and from XHTML", is it also possible to say that the XML encoding declaration should be permitted in HTML to facilitate migration to and from HTML. In fact, it does not seem true - except when the document is UTF-8 encoded - that <meta charset="*"/> facilitates migration to and from XHTML. (And even then, as long as the document uses the UTF-8 BOM – which HTML UAs are required to support, then <meta charset="*"/> doesn't really facilitate anything.) The true story about facilitating migration between HTML and XHTML is that, yes, meta@charset can make this easier for UTF-8. While, when it comes to the non-UNICODE encodings, then meta@charset facilitates nothing *unless* the XML encoding declaration is also permitted. > And support isn't going to be added for mere polyglot purity. There would be no need to add support: In Polyglot Markup, then both meta@charset and XML encoding declaration would eventually be present. [1] Section "4.2.5 The meta element" of HTML5. -- leif halvard silli
Received on Monday, 2 August 2010 00:01:24 UTC