- From: <bugzilla@jessica.w3.org>
- Date: Mon, 14 Feb 2011 13:31:31 +0000
- To: public-html@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=12062 Summary: UTF-8 BOM should not be forbidden in Polyglot Markup Product: HTML WG Version: unspecified Platform: PC URL: http://dev.w3.org/html5/html-xhtml-author-guide/html-x html-authoring-guide.html#character-encoding OS/Version: All Status: NEW Severity: major Priority: P2 Component: HTML/XHTML Compatibility Authoring Guide (ed: Eliot Graff) AssignedTo: eliotgra@microsoft.com ReportedBy: xn--mlform-iua@xn--mlform-iua.no QAContact: public-html-bugzilla@w3.org CC: davidc@nag.co.uk, mike@w3.org, public-html-wg-issue-tracking@w3.org, public-html@w3.org, shadow2531@gmail.com, xn--mlform-iua@xn--mlform-iua.no, eliotgra@microsoft.com The spec says: ]] When polyglot markup uses UTF-8, it does not include a BOM. [[ I recomment to delete the above statement. Because I can see no basis for it. In my view, the UTF-8 BOM can *help* working with polyglot markup. Justificaiton: a) the UTF-8 BOM is understood by both XML and HTML5 parsers; b) the UTF-8 BOM allows you to not use <meta> @http-equiv="content-type" or @charset (which, nevertheless, only HTML parsers know) For offline parsing via file:// URLs, the presence of an UTF-8 BOM seems to me as an advantage. For online parsing it also offers the advantage that it provides encoding information even if HTTP fails to provide such information. The fact that some (very) legacy user agents may act up if they see the UTF-8 BOM has not prevented HTML5 from permitting it. Thus, if the UTF-8 BOM should be declared as something that is not used in Polyglot Markup, then please provide a justification/principle for such a decision. Further more, the following statement from the same sections seems to contradict the statement that the UTF-8 BOM should not be used: ]] Polyglot markup declares character encoding one of two ways: By using the BOM. In the HTTP header of the response [HTTP11], as in the following: [[ If you accept my argument that the UTF-8 BOM can be used, then I suggest replacing the above quote with following, more accurate reformulation: ]] Polyglot markup declares character encoding via the following ways, that might be used separately or in combination, as long as they contains the same encoding information: Inside the document: * by the use of a BOM; * by relying of the XML UTF-8 encoding default in combination with <meta charset="UTF-8"/> In the HTTP header [ etc - keep the current text ] [[ -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
Received on Monday, 14 February 2011 13:31:33 UTC