Re: Problem in publishing multilingual HTML document on web in UTF-8 encoding from Sebastian Redl on 2006-06-03 (www-html@w3.org from June 2006)

From: Sebastian Redl <sebastian.redl@getdesigned.at>
Date: Sat, 03 Jun 2006 15:31:00 +0200
To: www-html@w3.org
Message-ID: <44818F14.5020608@getdesigned.at>

Philip TAYLOR wrote:

> That certainly addresses my paradox issue, but seems to suggest (to me)
> that a single document may actually use two (or perhaps more) character
> sets, one which obtains up to the point of the META element, and another
> thereafter.  If this were not the case, the parenthesis "(at least until
> the META element is parsed)" would appear to be redundant.

Not true. To me, this suggests that the whole construct is only valid in 
character sets that have the ASCII set as a direct subset, such as UTF-8 
and ISO-8859-*, but only after the meta may characters outside the ASCII 
range appear. It is not valid, however, to change the character set 
completely with the meta element.

> Is this at the heart of Ian's ("Hixie"s) example :
>
>     <meta http-equiv="content-type" content="text/html; charset=utf-16">
>
> was everything prior to and including this in UTF-8, and everything
> thereafter in UTF-16 ?

With my understanding, unless the character coding is signalled through 
some other way (Content-type HTTP header or similar mechanism), such 
code is invalid, except for UTF-16 and UTF-8, provided that the same 
support requirement applies to HTML as does to XML. (I'm not that 
familiar with the HTML spec.)

Sebastian Redl

Received on Saturday, 3 June 2006 13:30:55 UTC