W3C home > Mailing lists > Public > www-html@w3.org > June 2006

Re: Problem in publishing multilingual HTML document on web in UTF-8 encoding

From: आशीष शुक्ला \ <wahjava@gmail.com>
Date: Sat, 3 Jun 2006 11:58:44 +0530
Message-ID: <d9a03f10606022328r19ed92edsd9b4bd3ad96d586a@mail.gmail.com>
To: www-html@w3.org
On 6/3/06, Philip TAYLOR <P.Taylor@rhul.ac.uk> wrote:
>
>
> आशीष शुक्ल> "Wah Java !!" wrote:
>
>  > If UA (user agent), finds a "Content-Type" in <meta> tag in HTML document,
> > it should use that to identify the document's character encoding,
> > because it is a part of the document. The server's reply should only
> > be considered when document doesn't explicitly states its character
> > encoding.
>
> Much as I think your argument has merit, I cannot see how you
> can resolve the following paradox : suppose, in some as-yet
> unknown encoding (say, ISO-9999-9), the character positions
> which in ISO-8859-1 correspond to the letters "M", "E", "T"
> and "A" correspond instead to the letters "B", "O", "D" and "Y".
> Now the server says that the document is in ISO-8859-1,
> so when the UA sees
>
>         <META http-equiv="content-type" content="text/html; charset=iso-9999-9">
>
> it interprets the META directive as you would wish.  But in so
> doing, it starts to parse the document on the basis of it being
> expressed in ISO-9999-9, whereupon it discovers that there wasn't
> a META directive at all, there was, rather, a(n ill-formed) BODY
> tag. But because it now knows there /was/ no META directive, it
> parses using ISO-8859-1.  But that means there IS a META
> directive.  And so on.  I'm sure you see the problem ...
>
> Philip Taylor
>
-- begin excerpt --
To address server or configuration limitations, HTML documents may
include explicit information about the document's character encoding;
the  META element can be used to provide user agents with this
information.
For example, to specify that the character encoding of the current
document is "EUC-JP", a document should include the following  META
declaration:
<META http-equiv="Content-Type" content="text/html; charset=EUC-JP">

The META declaration must only be used when the character encoding is
organized such that ASCII-valued bytes stand for ASCII characters (at
least until the META element is parsed).  META declarations should
appear as early as possible in the  HEAD element
-- end excerpt --

Copied from: http://www.w3.org/TR/html4/charset.html#h-5.2.2

An excerpt from HTML 4.01 specification. So in other words you've to
organize your content such that your content till <META> tag is ASCII.
I think this is what this excerpt means.

Thanks
Ashish Shukla
-- 
Ashish Shukla "Wah Java !!"
आशीष शुक्ल>

  ,= ,-_-. =.
 ((_/)o o(\_))
  `-'(. .)`-'
      \_/

My blah, blah, blah at http://wahjava.blogspot.com/
My webpages at http://www.geocities.com/wah_java_dotnet/

My GPG Fingerprint: BBA9 AD7D BA71 61EB BE46 8CF5 E44A C663 A03F 4261

My GPG keys at
http://keyserv.nic-se.se:11371/pks/lookup?op=get&search=0xA03F4261
--
Supercomputers are for people too rich and too stupid to design
efficient algorithms -- Steven Skiena, Department of Computer Science,
SUNY Stony Brook.
Received on Saturday, 3 June 2006 06:28:55 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:16:06 GMT