- From: <bugzilla@jessica.w3.org>
- Date: Mon, 14 Feb 2011 13:31:31 +0000
- To: public-html-bugzilla@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=12062
Summary: UTF-8 BOM should not be forbidden in Polyglot Markup
Product: HTML WG
Version: unspecified
Platform: PC
URL: http://dev.w3.org/html5/html-xhtml-author-guide/html-x
html-authoring-guide.html#character-encoding
OS/Version: All
Status: NEW
Severity: major
Priority: P2
Component: HTML/XHTML Compatibility Authoring Guide (ed: Eliot
Graff)
AssignedTo: eliotgra@microsoft.com
ReportedBy: xn--mlform-iua@xn--mlform-iua.no
QAContact: public-html-bugzilla@w3.org
CC: davidc@nag.co.uk, mike@w3.org,
public-html-wg-issue-tracking@w3.org,
public-html@w3.org, shadow2531@gmail.com,
xn--mlform-iua@xn--mlform-iua.no,
eliotgra@microsoft.com
The spec says:
]] When polyglot markup uses UTF-8, it does not include a BOM. [[
I recomment to delete the above statement. Because I can see no basis for it.
In my view, the UTF-8 BOM can *help* working with polyglot markup.
Justificaiton:
a) the UTF-8 BOM is understood by both XML and HTML5 parsers;
b) the UTF-8 BOM allows you to not use <meta> @http-equiv="content-type"
or @charset (which, nevertheless, only HTML parsers know)
For offline parsing via file:// URLs, the presence of an UTF-8 BOM seems to me
as an advantage. For online parsing it also offers the advantage that it
provides encoding information even if HTTP fails to provide such information.
The fact that some (very) legacy user agents may act up if they see the UTF-8
BOM has not prevented HTML5 from permitting it. Thus, if the UTF-8 BOM should
be declared as something that is not used in Polyglot Markup, then please
provide a justification/principle for such a decision.
Further more, the following statement from the same sections seems to
contradict the statement that the UTF-8 BOM should not be used:
]]
Polyglot markup declares character encoding one of two ways:
By using the BOM.
In the HTTP header of the response [HTTP11], as in the following:
[[
If you accept my argument that the UTF-8 BOM can be used, then I suggest
replacing the above quote with following, more accurate reformulation:
]]
Polyglot markup declares character encoding via the following ways, that might
be used separately or in combination, as long as they contains the same
encoding information:
Inside the document:
* by the use of a BOM;
* by relying of the XML UTF-8 encoding default in combination with <meta
charset="UTF-8"/>
In the HTTP header [ etc - keep the current text ]
[[
--
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Monday, 14 February 2011 13:31:33 UTC