W3C home > Mailing lists > Public > www-html@w3.org > June 2006

Re: Problem in publishing multilingual HTML document on web in UTF-8 encoding

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 2 Jun 2006 20:44:33 +0000 (UTC)
To: Philip TAYLOR <P.Taylor@Rhul.Ac.Uk>
Cc: W3C HTML Mailing List <www-html@w3.org>
Message-ID: <Pine.LNX.4.62.0606022043260.22058@dhalsim.dreamhost.com>

On Fri, 2 Jun 2006, Philip TAYLOR wrote:
> Much as I think your argument has merit, I cannot see how you can 
> resolve the following paradox : suppose, in some as-yet unknown encoding 
> (say, ISO-9999-9), the character positions which in ISO-8859-1 
> correspond to the letters "M", "E", "T" and "A" correspond instead to 
> the letters "B", "O", "D" and "Y". Now the server says that the document 
> is in ISO-8859-1, so when the UA sees
> 	<META http-equiv="content-type" content="text/html;
> charset=iso-9999-9">
> it interprets the META directive as you would wish.  But in so doing, it 
> starts to parse the document on the basis of it being expressed in 
> ISO-9999-9, whereupon it discovers that there wasn't a META directive at 
> all, there was, rather, a(n ill-formed) BODY tag. But because it now 
> knows there /was/ no META directive, it parses using ISO-8859-1.  But 
> that means there IS a META directive.  And so on.  I'm sure you see the 
> problem ...

I've actually seen this, with a UTF-8 document that said:

   <meta http-equiv="content-type" content="text/html; charset=utf-16">

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 2 June 2006 23:16:50 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 30 April 2020 16:20:59 UTC