- From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
- Date: Tue, 21 Jan 97 11:36:43 CST
- To: W3C SGML Working Group <w3c-sgml-wg@www10.w3.org>
On Tue, 21 Jan 1997 09:39:26 -0500 Dave Peterson said: >If all XML uses the same SGML declaration, and that declaration >specifies ISO 10646, then all XML machines must be ISO 10646 machines. >Or are we planning to allow different XML documents to have different >SGML character sets? The spec says "The specific SGML declaration needed to enable SGML systems to process XML documents will vary from document to document" depending on the character encoding of the document's entities. It may also vary with the SGML system's understanding and implementation of 8879's rules for character handling, which seem to give rise to wildly varying interpretations. >The interesting question is: How do you deal with a legal SGML character >that your system has no internal representation for? I haven't thought >that one through. In some cases, I suspect your entity manager could >convert the character as stored in the storage representation into a >numeric character reference. But that can break down if the character >is used in markup. Hmmm.... ;-) Systems with no internal representation for strings of 16 and 32 bits would appear not to be capable of handling XML. Are there any such systems? N.B. whether they think the 16-bit string is a 'character' is a separate issue. Internal representation just means you can read it, store it, and write it without information loss. It does not mean you can display or print it. The unicode standard has what seems to me a good discussion of fallback behaviors in such conditions. -C. M. Sperberg-McQueen
Received on Tuesday, 21 January 1997 12:47:22 UTC