W3C home > Mailing lists > Public > www-international@w3.org > July to September 2004

Re: faq suggestions

From: Tex Texin <tex@i18nguy.com>
Date: Mon, 23 Aug 2004 00:06:18 -0700
Message-ID: <4129976A.B5D4CF10@i18nguy.com>
To: Martin Duerst <duerst@w3.org>
CC: Jungshik Shin <jshin@i18nl10n.com>, www-international@w3.org

Konnichiwa Martin,

1) I wrote my last mail as you wrote yours and the supporting statement was in
that message.

http://www.w3.org/TR/html401/charset.html#h-5.2.2

"The META declaration must only be used when the character encoding is
organized such that ASCII-valued bytes stand for ASCII characters (at least
until the META element is parsed). META declarations should appear as early as
possible in the HEAD element."

If the document was going to be reparsed there would be less need for
only ASCII-values to precede it.

2) I don't follow your logic:
> To take the above EUC-JP example, EUC-JP is ASCII-compatible as you
> have defined. A <title> with Japanese text should not appear before
> the <meta>, but such a case is not forbidden. And in that case,
> the <title> has to be interpreted as EUC-JP; I don't see any
> way to read the spec differently.

Yes EUC-JP is ASCII-compatible. (Somewhat irrelevant though. The term was
brought up to clarify Jungshik's remarks.)

However, if the User Agent has made some presumption of the encoding due to the
lack of an http charset declaration, then the title would be interpreted in
that encoding. I don't see why the paragraph you excerpted requires it to be
interpreted as euc-jp.
(But it would be nice.)

tex


Martin Duerst wrote:
> 
> Hello Tex,
> 
> At 19:36 04/08/22 -0700, Tex Texin wrote:
> 
> >Hi Jungshik,
> >
> >With respect to user agents reparsing documents from the beginning, can
> >you say
> >which ones do this?
> >They are not obligated to and the wording of the standards implies that the
> >encoding "switch" from the initial value to the value specified in the charset
> >statement, occurs at the point the statement is parsed.
> 
> Can you point to some place that supports that statement?
> 
> At http://www.w3.org/TR/html401/charset.html#h-5.2.2, I find:
> 
>  > To address server or configuration limitations, HTML documents may
>  > include explicit information about the document's character encoding;
>  > the META element can be used to provide user agents with this information.
> 
> This says "the document's character encoding", nothing about points
> after.
> 
>  > For example, to specify that the character encoding of the current
>  > document
> 
> This again says "character encoding of the current *document*".
> 
>  > is "EUC-JP", a document should include the following META
>  > declaration:
>  >
>  > <META http-equiv="Content-Type" content="text/html; charset=EUC-JP">
>  >
>  > The META declaration must only be used when the character encoding is
>  > organized such that ASCII-valued bytes stand for ASCII characters (at
>  > least until the META element is parsed). META declarations should appear
>  > as early as possible in the HEAD element.
> 
> To take the above EUC-JP example, EUC-JP is ASCII-compatible as you
> have defined. A <title> with Japanese text should not appear before
> the <meta>, but such a case is not forbidden. And in that case,
> the <title> has to be interpreted as EUC-JP; I don't see any
> way to read the spec differently.
> 
> Regards,    Martin.
> 
> >On a separate point I wonder if you meant ASCII-compatible or simply ASCII.
> >If the text prior to the charset statement consists of only ASCII characters,
> >then yes, the later position of the charset statement is moot. But if the
> >statements preceding the charset statement contain non-ASCII characters in an
> >ASCII-compatible encoding, if the user agent doesn't reparse from the
> >beginning, then        itmaymisinterpretthecontentofthosestatements.
> >
> >(To clarify, to e an ASCII-compatible encoding is one that assigns the same
> >characters as the ASCII character set does to the values 0-127, and then
> >assigns additional characters to values greater than 127.)
> >
> >tex
> >
> >Jungshik Shin wrote:
> >
> > > Tex Texin wrote:
> > > > Otherwise text in the page prior to the charset statement may not be
> > decoded
> > > > correctly.
> > >
> > > However, as long as the encoding used is ASCII-compatible, it doesn't
> > > matter much. I believe most user 'agents' look for 'meta' declaration
> > > for charset and reparse the document from the beginning after
> > > determining the encoding (assuming http C-T header doesn't have charset
> > > parameter)
> > >
> > > Jungshik
> >
> >--
> >-------------------------------------------------------------
> >Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
> >Xen Master                          http://www.i18nGuy.com
> >
> >XenCraft                            http://www.XenCraft.com
> >Making e-Business Work Around the World
> >-------------------------------------------------------------

-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
Xen Master                          http://www.i18nGuy.com
                         
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------
Received on Monday, 23 August 2004 07:07:26 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:03 GMT