W3C home > Mailing lists > Public > www-international@w3.org > July to September 2004

Re: faq suggestions

From: Martin Duerst <duerst@w3.org>
Date: Mon, 23 Aug 2004 12:37:30 +0900
Message-Id: <4.2.0.58.J.20040823123036.0566be88@localhost>
To: Tex Texin <tex@i18nguy.com>, Jungshik Shin <jshin@i18nl10n.com>
Cc: Jakub Friedl <kyknos@gmail.com>, www-international@w3.org

Hello Tex,

At 19:36 04/08/22 -0700, Tex Texin wrote:

>Hi Jungshik,
>
>With respect to user agents reparsing documents from the beginning, can 
>you say
>which ones do this?
>They are not obligated to and the wording of the standards implies that the
>encoding "switch" from the initial value to the value specified in the charset
>statement, occurs at the point the statement is parsed.

Can you point to some place that supports that statement?

At http://www.w3.org/TR/html401/charset.html#h-5.2.2, I find:

 > To address server or configuration limitations, HTML documents may
 > include explicit information about the document's character encoding;
 > the META element can be used to provide user agents with this information.

This says "the document's character encoding", nothing about points
after.

 > For example, to specify that the character encoding of the current
 > document

This again says "character encoding of the current *document*".

 > is "EUC-JP", a document should include the following META
 > declaration:
 >
 > <META http-equiv="Content-Type" content="text/html; charset=EUC-JP">
 >
 > The META declaration must only be used when the character encoding is
 > organized such that ASCII-valued bytes stand for ASCII characters (at
 > least until the META element is parsed). META declarations should appear
 > as early as possible in the HEAD element.

To take the above EUC-JP example, EUC-JP is ASCII-compatible as you
have defined. A <title> with Japanese text should not appear before
the <meta>, but such a case is not forbidden. And in that case,
the <title> has to be interpreted as EUC-JP; I don't see any
way to read the spec differently.

Regards,    Martin.



>On a separate point I wonder if you meant ASCII-compatible or simply ASCII.
>If the text prior to the charset statement consists of only ASCII characters,
>then yes, the later position of the charset statement is moot. But if the
>statements preceding the charset statement contain non-ASCII characters in an
>ASCII-compatible encoding, if the user agent doesn't reparse from the
>beginning, then it may misinterpret the content of those statements.
>
>(To clarify, to e an ASCII-compatible encoding is one that assigns the same
>characters as the ASCII character set does to the values 0-127, and then
>assigns additional characters to values greater than 127.)
>
>tex
>
>Jungshik Shin wrote:
>
> > Tex Texin wrote:
> > > Otherwise text in the page prior to the charset statement may not be 
> decoded
> > > correctly.
> >
> > However, as long as the encoding used is ASCII-compatible, it doesn't
> > matter much. I believe most user 'agents' look for 'meta' declaration
> > for charset and reparse the document from the beginning after
> > determining the encoding (assuming http C-T header doesn't have charset
> > parameter)
> >
> > Jungshik
>
>--
>-------------------------------------------------------------
>Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
>Xen Master                          http://www.i18nGuy.com
>
>XenCraft                            http://www.XenCraft.com
>Making e-Business Work Around the World
>-------------------------------------------------------------
Received on Monday, 23 August 2004 06:17:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:03 GMT