W3C home > Mailing lists > Public > www-international@w3.org > July to September 2004

Re: faq suggestions

From: Tex Texin <tex@i18nguy.com>
Date: Sun, 22 Aug 2004 20:34:07 -0700
Message-ID: <412965AF.D3BD0135@i18nguy.com>
To: Bjoern Hoehrmann <derhoermi@gmx.net>
CC: www-international@w3.org

thanks for the info Bjoern.

That is interesting that IE only reparses from the current chunk.

The reason I stated the standard implies the switch occurs after the charset is
parsed is text like:


"The META declaration must only be used when the character encoding is
organized such that ASCII-valued bytes stand for ASCII characters (at least
until the META element is parsed). META declarations should appear as early as
possible in the HEAD element."

If the document was going to be reparsed there would be less need for
ASCII-values to precede it.
Also if the document is reparsed from the beginning, what happens if the page
is encoded in an ebcdic encoding?
If the page is ebcdic from the first byte, then the meta charset statement
won't be parsable...

However, CSS 2.1 is a bit better and inline with your and Jungshik's ideas.

"Note that reliance on the @charset construct theoretically poses a problem
since there is no a priori information on how it is encoded. In practice,
however, the encodings in wide use on the Internet are either based on ASCII,
UTF-16, UCS-4, or (rarely) on EBCDIC. This means that in general, the initial
byte values of a style sheet enable a user agent to detect the encoding family
reliably, which provides enough information to decode the @charset rule, which
in turn determines the exact character encoding."

The @charset statement must be the first in the CSS file, and clearly the spec
expects the UA to make enough of a determination of the encoding of the file to
be able to confirm it exactly by parsing the @charset value.


Bjoern Hoehrmann wrote:
> * Tex Texin wrote:
> >With respect to user agents reparsing documents from the beginning, can you say
> >which ones do this?
> Internet Explorer for Windows re-parses the chunk in which the <meta>
> element was found (a chunk is usually a block of 8 KB), Mozilla re-
> parses all the chunks, that's at least what I remember from tests.
> You can test such things using a <title> element prior to the <meta>
> element, for example.
> >They are not obligated to and the wording of the standards implies that the
> >encoding "switch" from the initial value to the value specified in the charset
> >statement, occurs at the point the statement is parsed.
> That's not clear to me at all...

Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
Xen Master                          http://www.i18nGuy.com
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
Received on Monday, 23 August 2004 03:35:15 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:24 UTC