- From: Martin Duerst <duerst@w3.org>
- Date: Mon, 23 Aug 2004 12:37:30 +0900
- To: Tex Texin <tex@i18nguy.com>, Jungshik Shin <jshin@i18nl10n.com>
- Cc: Jakub Friedl <kyknos@gmail.com>, www-international@w3.org
Hello Tex, At 19:36 04/08/22 -0700, Tex Texin wrote: >Hi Jungshik, > >With respect to user agents reparsing documents from the beginning, can >you say >which ones do this? >They are not obligated to and the wording of the standards implies that the >encoding "switch" from the initial value to the value specified in the charset >statement, occurs at the point the statement is parsed. Can you point to some place that supports that statement? At http://www.w3.org/TR/html401/charset.html#h-5.2.2, I find: > To address server or configuration limitations, HTML documents may > include explicit information about the document's character encoding; > the META element can be used to provide user agents with this information. This says "the document's character encoding", nothing about points after. > For example, to specify that the character encoding of the current > document This again says "character encoding of the current *document*". > is "EUC-JP", a document should include the following META > declaration: > > <META http-equiv="Content-Type" content="text/html; charset=EUC-JP"> > > The META declaration must only be used when the character encoding is > organized such that ASCII-valued bytes stand for ASCII characters (at > least until the META element is parsed). META declarations should appear > as early as possible in the HEAD element. To take the above EUC-JP example, EUC-JP is ASCII-compatible as you have defined. A <title> with Japanese text should not appear before the <meta>, but such a case is not forbidden. And in that case, the <title> has to be interpreted as EUC-JP; I don't see any way to read the spec differently. Regards, Martin. >On a separate point I wonder if you meant ASCII-compatible or simply ASCII. >If the text prior to the charset statement consists of only ASCII characters, >then yes, the later position of the charset statement is moot. But if the >statements preceding the charset statement contain non-ASCII characters in an >ASCII-compatible encoding, if the user agent doesn't reparse from the >beginning, then it may misinterpret the content of those statements. > >(To clarify, to e an ASCII-compatible encoding is one that assigns the same >characters as the ASCII character set does to the values 0-127, and then >assigns additional characters to values greater than 127.) > >tex > >Jungshik Shin wrote: > > > Tex Texin wrote: > > > Otherwise text in the page prior to the charset statement may not be > decoded > > > correctly. > > > > However, as long as the encoding used is ASCII-compatible, it doesn't > > matter much. I believe most user 'agents' look for 'meta' declaration > > for charset and reparse the document from the beginning after > > determining the encoding (assuming http C-T header doesn't have charset > > parameter) > > > > Jungshik > >-- >------------------------------------------------------------- >Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com >Xen Master http://www.i18nGuy.com > >XenCraft http://www.XenCraft.com >Making e-Business Work Around the World >-------------------------------------------------------------
Received on Monday, 23 August 2004 06:17:44 UTC