Re: I18N issue needs consideration from Rick Jelliffe on 1997-06-15 (w3c-sgml-wg@w3.org from June 1997)

From: Rick Jelliffe <ricko@allette.com.au>
Date: Mon, 16 Jun 1997 01:48:06 +1000
To: <w3c-sgml-wg@w3.org>
Message-Id: <199706151547.BAA29485@jawa.chilli.net.au>

I think this I18N issue has four questions, with my suggested answers.

Q1) What encodings can an XML 1.0 document have?
	A1) Anything you (and MIME) like, provided it has the correct encoding PIs or whatever.

Q2) What character set must an XML 1.0 document use?
	A2) Unicode 2.0, with all (future) surrogate characters represented by numeric character references.
	Surrogate characters cannot be used in markup.

Q3) What character format should an XML 1.0  parser use?
	A3) An XML 1.0 parser should be able to read all XML 1.0 documents. This means that  it must
	have 16-bit or more characters (at least in its reading and parsing routines).  If we don't do this
	then there will be some XML documents that cannot be read by some XML parsers, which I
	think is a bit slack. (An 8-bit processor might still work, but it should be considered a 
	partial XML processor, not a real or full one.)  

Q4) What character format should the parser use for character passing to the application?
	A4) None of our (XML WG's) business. It can be 8-bit, 22-bit, structs, anything.


Rick Jelliffe

Received on Sunday, 15 June 1997 11:47:34 UTC