W3C home > Mailing lists > Public > www-html@w3.org > June 2006

Re: Problem in publishing multilingual HTML document on web in UTF-8 encoding

From: Albert Lunde <atlunde@panix.com>
Date: Sat, 3 Jun 2006 14:49:54 -0400
To: www-html@w3.org
Message-ID: <20060603184954.GA24508@panix.com>

On Sat, Jun 03, 2006 at 10:02:56PM +0530, Wah Java wrote:
> I've never seen non-ASCII based XML documents. So I think, the
> document has to be split in two parts, first which is encoded in ASCII
> compatible encoding, should become header and the rest of the document
> which contains the text (encoded in encoding specified in the header
> part).
> 
> Am I correct or not ??

That seems not to be the case. The point of the specs was to exploit 
the fact that so many encodings used on the web are recognizable 
supersets of ASCII characters and encoding, not to mandate that 
one switch encodings midway though a file.

(It's not hard to auto-recognize EBCDIC vs ASCII, but it and other
legacy encodings like CDC Display Code, are thankfully scarce
in the problem space of HTML/XML.)

The XML declaration was intended to be a little less of a hack
than META charset declarations in HTML, providing an inline
encoding declaration that was a little easier to parse.

See also:

"Tutorial: Character sets & encodings in XHTML, HTML and CSS"

http://www.w3.org/International/tutorials/tutorial-char-enc

-- 
    Albert Lunde  albert-lunde@northwestern.edu
                  atlunde@panix.com  (new address for personal mail)
                  albert-lunde@nwu.edu (old address)
Received on Saturday, 3 June 2006 18:50:20 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:16:06 GMT