- From: Albert Lunde <atlunde@panix.com>
- Date: Sat, 3 Jun 2006 14:49:54 -0400
- To: www-html@w3.org
On Sat, Jun 03, 2006 at 10:02:56PM +0530, Wah Java wrote: > I've never seen non-ASCII based XML documents. So I think, the > document has to be split in two parts, first which is encoded in ASCII > compatible encoding, should become header and the rest of the document > which contains the text (encoded in encoding specified in the header > part). > > Am I correct or not ?? That seems not to be the case. The point of the specs was to exploit the fact that so many encodings used on the web are recognizable supersets of ASCII characters and encoding, not to mandate that one switch encodings midway though a file. (It's not hard to auto-recognize EBCDIC vs ASCII, but it and other legacy encodings like CDC Display Code, are thankfully scarce in the problem space of HTML/XML.) The XML declaration was intended to be a little less of a hack than META charset declarations in HTML, providing an inline encoding declaration that was a little easier to parse. See also: "Tutorial: Character sets & encodings in XHTML, HTML and CSS" http://www.w3.org/International/tutorials/tutorial-char-enc -- Albert Lunde albert-lunde@northwestern.edu atlunde@panix.com (new address for personal mail) albert-lunde@nwu.edu (old address)
Received on Saturday, 3 June 2006 18:50:20 UTC