XHTML 1.0 Erratum: charset in http-equiv from Bjoern Hoehrmann on 2000-10-09 (www-html-editor@w3.org from October to December 2000)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Mon, 9 Oct 2000 22:09:34 +0200
To: <www-html-editor@w3.org>
Message-ID: <01f701c0322d$243d7b20$23cbb43e@de>

Hi,

XHTML 1.0 [1] reads:

[...]
C.9 Character Encoding

To specify a character encoding in the document, use both the encoding
attribute specification on the xml declaration (e.g. <?xml version="1.0"
encoding="EUC-JP"?>) and a meta http-equiv statement (e.g. <meta
http-equiv="Content-type" content='text/html; charset="EUC-JP"' />). The value
of the encoding attribute of the xml processing instruction takes precedence.
[...]

RFC 2616 [2] says:

[...]
   HTTP character sets are identified by case-insensitive tokens. The
   complete set of tokens is defined by the IANA Character Set registry
   [19].

       charset = token

   Although HTTP allows an arbitrary token to be used as a charset
   value, any token that has a predefined value within the IANA
   Character Set registry [19] MUST represent the character set defined
   by that registry. Applications SHOULD limit their use of character
   sets to those defined by the IANA registry.
[...]

The token from the example is '"EUC-JP"'. There is no such character set.
There is a character set 'Extended_UNIX_Code_Packed_Format_for_Japanese' with
an alias 'EUC-JP' but this is a different charset than '"EUC-JP"'. In other
words: the quotes around 'EUC-JP' are wrong.

HTML 4.01 gets this right, see [3]

[...]
<META http-equiv="Content-Type" content="text/html; charset=EUC-JP">
[...]

[1] http://www.w3.org/TR/2000/REC-xhtml1-20000126
[2] http://www.ietf.org/rfc/rfc2616.txt
[3] http://www.w3.org/TR/html401/charset.html#h-5.2.2
--
Björn Höhrmann ^ mailto:bjoern@hoehrmann.de ^ http://www.bjoernsworld.de
am Badedeich 7 ° Telefon: +49(0)4667/981ASK ° http://www.websitedev.de/
25899 Dagebüll # PGP Pub. KeyID: 0xA4357E78 # http://learn.to/quote +{i}
..weaving a secure, well-formed, standard compliant WWW for =everyone=..

Received on Monday, 9 October 2000 16:13:01 UTC