W3C home > Mailing lists > Public > www-html@w3.org > June 2006

Re: Problem in publishing multilingual HTML document on web in UTF-8 encoding

From: Philip TAYLOR <P.Taylor@Rhul.Ac.Uk>
Date: Fri, 02 Jun 2006 21:31:06 +0100
Message-ID: <4480A00A.50802@Rhul.Ac.Uk>
To: "आशीष शुक्ला \"Wah Java !!\"" <wahjava@gmail.com>
CC: W3C HTML Mailing List <www-html@w3.org>

आशीष शुक्ला "Wah Java !!" wrote:

 > If UA (user agent), finds a "Content-Type" in <meta> tag in HTML document,
> it should use that to identify the document's character encoding,
> because it is a part of the document. The server's reply should only
> be considered when document doesn't explicitly states its character
> encoding.

Much as I think your argument has merit, I cannot see how you
can resolve the following paradox : suppose, in some as-yet
unknown encoding (say, ISO-9999-9), the character positions
which in ISO-8859-1 correspond to the letters "M", "E", "T"
and "A" correspond instead to the letters "B", "O", "D" and "Y".
Now the server says that the document is in ISO-8859-1,
so when the UA sees

	<META http-equiv="content-type" content="text/html; charset=iso-9999-9">

it interprets the META directive as you would wish.  But in so
doing, it starts to parse the document on the basis of it being
expressed in ISO-9999-9, whereupon it discovers that there wasn't
a META directive at all, there was, rather, a(n ill-formed) BODY
tag. But because it now knows there /was/ no META directive, it
parses using ISO-8859-1.  But that means there IS a META
directive.  And so on.  I'm sure you see the problem ...

Philip Taylor
Received on Friday, 2 June 2006 20:30:32 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 30 April 2020 16:20:59 UTC