- From: <bugzilla@jessica.w3.org>
- Date: Sun, 13 Mar 2011 02:40:44 +0000
- To: public-html-bugzilla@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=12062
--- Comment #14 from Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no> 2011-03-13 02:40:43 UTC ---
(In reply to comment #13)
As one piece of text - and with some additional changes. Justification etc at
the bottom.
]]
3. Specifying a Document's Character Encoding
Polyglot markup uses UTF-8, the only character encoding for which both HTML and
XML requires support. For HTML, then UTF-8 has to be explicitly declared, to
avoid fallback to a legacy encoding. For XML, then UTF-8 is the encoding
default and as such MAY be left undeclared.
The UTF-8 encoding is declared in the following ways, which can be used
together or separately:
* Within the document
o By using the Byte Order Mark (BOM) character (preferred).
o By using <meta charset="UTF-8"/> (the HTML encoding
declaration).
* Outside the document
o By adding "charset=utf-8" to the MIME/HTTP Content-Type
header [HTTP11]:
HTML Content-Type example:
Content-type: text/html; charset=utf-8
XHTML Content-Type example:
Content-type: application/xhtml+xml; charset=utf-8
NOTE: The HTML encoding declaration has no effect in XML. So when this is the
only encoding declaration, then it is XML's encoding default that makes XML
parsers treat it as UTF-8.
The W3C Internationalization (i18n) Group recommends to always include a
visible encoding declaration in a document, because it helps developers,
testers, or translation production managers to check the encoding of a document
visually.
[[
JUSTIFICATION for some of the wording choices above:
* 'XML encoding declaration' is a wording used in XML 1.0.
'HTML encoding declaration' is made on the same pattern.
* Tried to use '_character_ encoding' at least once.
* 'legacy encoding' = HTML5 uses this wording about
non-UTF-8 encodings
* deleted "(if used in combination, each approach contains
identical encoding information)" because it is unrelevant when
the only encoding is UTF-8
* 'Outside the document' - for analogy with your 'Inside the
document'
* Important to have both an XHTML exampe and a HTML example
with regard to the MIME/HTTP
* Added 'MIME' as, that is what it is.
* Tried to diminish the number of places where the text
mentioned the encoding default of XML ...
* 'in combination' does not make good sense as it indicates that
the methods cooperates. Tried 'together' instead.
* Deleted note about other MIME types because the text says "By adding
"charset=utf-8" to the MIME/HTTP Content-Type header", which is valid for any
MIME type. The examples are just examples.
* Tried to be short.
--
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Sunday, 13 March 2011 02:40:46 UTC