W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > March 2011

[Bug 12062] UTF-8 BOM should not be forbidden in Polyglot Markup

From: <bugzilla@jessica.w3.org>
Date: Fri, 11 Mar 2011 18:20:18 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1Py6wk-0008Su-AC@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=12062

Eliot Graff <eliotgra@microsoft.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED

--- Comment #12 from Eliot Graff <eliotgra@microsoft.com> 2011-03-11 18:20:16 UTC ---
Made the edits requested in comment 11 and then further edited to remove
UTF-16, per bug 12242. Section 3 now reads:

]]
Polyglot markup declares character encoding in the following ways, which may be
used separately or in combination (if used in combination, each approach
contains identical encoding information): 
•Within the document
&#9702;By using the Byte Order Mark (BOM) character (preferred).
&#9702;By relying on UTF-8 as the encoding default of XML, used in combination
with the HTML <meta charset="UTF-8"/> element.
•In the HTTP header of the response [HTTP11], as in the following: 
Content-type: text/html; charset=utf-8
 Note that polyglot markup may use either text/html or application/xhtml+xml
for the value of the content type. 


Using <meta charset="*"/> has no effect in XML. Therefore, polyglot markup may
use <meta charset="*"/> provided the document is encoded as UTF-8 and the value
of charset is a case-insensitive match for the string "utf-8". 

Polyglot markup uses UTF-8 encoding. The BOM character may be used with the
UTF-8 encoding (see Writing HTML documents in [HTML5]), and using the BOM
character is preferred to not using the BOM character. Because the construct of
the BOM character is the same for XML and HTML (unlike the encoding declaration
inside the HTTP Content-Type header), and because the BOM character works in
both XML and HTML (unlike the <meta charset="UTF-8"/> declaration of HTML and
the UTF-8 encoding default of XML), the BOM character can be said to be the
most polyglot encoding declaration. 

The W3C Internationalization (i18n) Group recommends to always include a
visible encoding declaration in a document, because it helps developers,
testers, or translation production managers to check the encoding of a document
visually. 
[[

I believe this satisfies all the requests in this bug, so, once again, I am
resolving it as fixed. I have faith that you will let me know if there are any
other issues.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Friday, 11 March 2011 18:20:20 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 11 March 2011 18:20:24 GMT