W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > March 2011

[Bug 12062] UTF-8 BOM should not be forbidden in Polyglot Markup

From: <bugzilla@jessica.w3.org>
Date: Sun, 13 Mar 2011 02:40:44 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1PybEa-00045U-Li@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=12062

--- Comment #14 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 2011-03-13 02:40:43 UTC ---
(In reply to comment #13)
As one piece of text - and with some additional changes. Justification etc at
the bottom.

]]
3. Specifying a Document's Character Encoding

Polyglot markup uses UTF-8, the only character encoding for which both HTML and
XML requires support. For HTML, then UTF-8 has to be explicitly declared, to
avoid fallback to a legacy encoding. For XML, then UTF-8 is the encoding
default and as such MAY be left undeclared.

The UTF-8 encoding is declared in the following ways, which can be used
together or separately:

* Within the document
  o By using the Byte Order Mark (BOM) character (preferred).
  o By using <meta charset="UTF-8"/> (the HTML encoding 
     declaration).
* Outside the document
  o By adding "charset=utf-8" to the MIME/HTTP Content-Type
     header [HTTP11]: 
    HTML Content-Type example: 
        Content-type: text/html; charset=utf-8
    XHTML Content-Type example:
        Content-type: application/xhtml+xml; charset=utf-8

 NOTE: The HTML encoding declaration has no effect in XML. So when this is the
only encoding declaration, then it is XML's encoding default that makes XML
parsers treat it as UTF-8.

  The W3C Internationalization (i18n) Group recommends to always include a
visible encoding declaration in a document, because it helps developers,
testers, or translation production managers to check the encoding of a document
visually.
[[

JUSTIFICATION for some of the wording choices above:

* 'XML encoding declaration' is a wording used in XML 1.0. 
   'HTML encoding declaration' is made on the same pattern.
* Tried to use '_character_ encoding' at least once.
* 'legacy encoding' = HTML5 uses this wording about 
   non-UTF-8 encodings
* deleted "(if used in combination, each approach contains 
  identical encoding information)" because it is unrelevant when
  the only encoding is UTF-8
* 'Outside the document' - for analogy with your 'Inside the 
  document'
* Important to have both an XHTML exampe and a HTML example 
  with regard to the MIME/HTTP
* Added 'MIME' as, that is what it is.
* Tried to diminish the number of places where the text 
   mentioned the encoding default of XML ...
* 'in combination' does not make good sense as it indicates that
   the methods cooperates. Tried 'together' instead.
* Deleted note about other MIME types because the text says "By adding
"charset=utf-8" to the MIME/HTTP Content-Type header", which is valid for any
MIME type. The examples are just examples.
* Tried to be short.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Sunday, 13 March 2011 02:40:46 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 13 March 2011 02:40:50 GMT