- From: Gavin Nicol <gtn@ebt.com>
- Date: Tue, 21 Jan 1997 13:36:42 -0500
- To: davep@acm.org
- CC: U35395@UICVM.UIC.EDU, w3c-sgml-wg@www10.w3.org
>If all XML uses the same SGML declaration, and that declaration specifies >ISO 10646, then all XML machines must be ISO 10646 machines. Or are >we planning to allow different XML documents to have different SGML >character sets? The character repertiore is fixed, though we are allowing different coded encoding as input, which are to be translated using the appropriate BCTF/decoder into a set of bit combinations. >Note that specifying Unicode as the "document character set" (1) specifies >what character numbers in numerical character references are to be >interpreted as what characters, and (2) specifies what characters (however >represented in the system character representation) are legal in the >document. Yes. >The interesting question is: How do you deal with a legal SGML character >that your system has no internal representation for? I haven't thought >that one through. In some cases, I suspect your entity manager could >convert the character as stored in the storage representation into a >numeric character reference. But that can break down if the character >is used in markup. Hmmm.... ;-) SGML is silent on this issue, I believe (we went through this for HTML, and I invoked this silence as a way for saying older browsers would be conformant with the HTML I18N draft). I think we could make this a reportable error (not a fatal error). A more interesting question is what to do with characters that do not appear within the character repertoire we have defined...
Received on Tuesday, 21 January 1997 13:38:37 UTC