- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Mon, 9 Oct 2000 22:09:34 +0200
- To: <www-html-editor@w3.org>
Hi,
XHTML 1.0 [1] reads:
[...]
C.9 Character Encoding
To specify a character encoding in the document, use both the encoding
attribute specification on the xml declaration (e.g. <?xml version="1.0"
encoding="EUC-JP"?>) and a meta http-equiv statement (e.g. <meta
http-equiv="Content-type" content='text/html; charset="EUC-JP"' />). The value
of the encoding attribute of the xml processing instruction takes precedence.
[...]
RFC 2616 [2] says:
[...]
HTTP character sets are identified by case-insensitive tokens. The
complete set of tokens is defined by the IANA Character Set registry
[19].
charset = token
Although HTTP allows an arbitrary token to be used as a charset
value, any token that has a predefined value within the IANA
Character Set registry [19] MUST represent the character set defined
by that registry. Applications SHOULD limit their use of character
sets to those defined by the IANA registry.
[...]
The token from the example is '"EUC-JP"'. There is no such character set.
There is a character set 'Extended_UNIX_Code_Packed_Format_for_Japanese' with
an alias 'EUC-JP' but this is a different charset than '"EUC-JP"'. In other
words: the quotes around 'EUC-JP' are wrong.
HTML 4.01 gets this right, see [3]
[...]
<META http-equiv="Content-Type" content="text/html; charset=EUC-JP">
[...]
[1] http://www.w3.org/TR/2000/REC-xhtml1-20000126
[2] http://www.ietf.org/rfc/rfc2616.txt
[3] http://www.w3.org/TR/html401/charset.html#h-5.2.2
--
Björn Höhrmann ^ mailto:bjoern@hoehrmann.de ^ http://www.bjoernsworld.de
am Badedeich 7 ° Telefon: +49(0)4667/981ASK ° http://www.websitedev.de/
25899 Dagebüll # PGP Pub. KeyID: 0xA4357E78 # http://learn.to/quote +{i}
..weaving a secure, well-formed, standard compliant WWW for =everyone=..
Received on Monday, 9 October 2000 16:13:01 UTC