Re: faq suggestions from Bjoern Hoehrmann on 2004-08-23 (www-international@w3.org from July to September 2004)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Mon, 23 Aug 2004 06:20:34 +0200
To: Tex Texin <tex@i18nguy.com>
Cc: www-international@w3.org
Message-ID: <41396d16.583521589@smtp.bjoern.hoehrmann.de>

* Tex Texin wrote:
>The reason I stated the standard implies the switch occurs after the charset is
>parsed is text like:
>
>http://www.w3.org/TR/html401/charset.html#h-5.2.2
>
>"The META declaration must only be used when the character encoding is
>organized such that ASCII-valued bytes stand for ASCII characters (at least
>until the META element is parsed). META declarations should appear as early as
>possible in the HEAD element."
>
>If the document was going to be reparsed there would be less need for
>ASCII-values to precede it.

The need exists because the user agent must assume some base character
encoding in order to find the <meta>. E.g., if the document is encoded
in an encoding that is identical to US-ASCII except that 6D is n, then
"<meta" would be "<neta" which the user agent would not find. The text
essentially means that documents that are encoded in UTF-16, EBCDIC,
etc. and have a <meta ... Content-Type ...> and lack higher-level
protocol encoding information are incorrect. Or that they are incorrect
regardless of higher-level protocol information. Who knows, it's the
HTML 4 Recommendation, it could mean anything...

Received on Monday, 23 August 2004 04:21:19 UTC