- From: Henry S. Thompson <ht@inf.ed.ac.uk>
- Date: Wed, 11 Jan 2012 17:51:48 +0000
- To: public-xml-core-wg@w3.org
ht writes:
> 1) It recommends [1] the use of the UTF-8 BOM -- that seems . . . odd
> to me.
OK, I've done some further checking, and we can't have either
<meta http-equiv="Content-type" content="text/html; charset=utf-8"/>
or
<meta http-equiv="Content-type" content="application/xhtml+; charset=utf-8"/>
in Polyglot, because the XHTML parser disallows the use of
http-equiv="Content-type" [1].
So net-net I think we should ask for the following as the beginning of
Section 3 of Polyglot [2]:
Polyglot markup uses the UTF-8 character encoding, the only character
encoding for which both HTML and XML require support. HTML requires
UTF-8 to be explicitly declared to avoid fallback to a legacy encoding
[HTML5]. For XML, UTF-8 is an encoding default. As such, character
encoding may be left undeclared in XML with the result that UTF-8 is
still supported [XML10].
Polyglot markup declares the UTF-8 character encoding in the following
ways, which may be used separately or in combination:
* Within the document
. By using <meta charset="UTF-8"/> (the HTML encoding
declaration) -- preferred
. By using the Byte Order Mark (BOM) character.
* Outside the document
. . .
ht
[1] http://www.w3.org/TR/2011/WD-html5-20110525/semantics.html#pragma-directives
[2] http://www.w3.org/TR/2011/WD-html-polyglot-20110525/#character-encoding
--
Henry S. Thompson, School of Informatics, University of Edinburgh
10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
URL: http://www.ltg.ed.ac.uk/~ht/
[mail from me _always_ has a .sig like this -- mail without it is forged spam]
Received on Wednesday, 11 January 2012 17:52:13 UTC