- From: Dave Peterson <davep@acm.org>
- Date: Tue, 21 Jan 1997 09:39:26 -0500
- To: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>, W3C SGML Working Group <w3c-sgml-wg@www10.w3.org>
At 6:35 PM 1/13/97, Michael Sperberg-McQueen wrote: > What was overlooked in the heat of the >moment (this was getting on toward the absolute deadline) was that the >machine in question was an ASCII machine, not a Unicode machine; of >course the 646 declaration was required for processing. But the >canonical SGML declaration should use Unicode. If all XML uses the same SGML declaration, and that declaration specifies ISO 10646, then all XML machines must be ISO 10646 machines. Or are we planning to allow different XML documents to have different SGML character sets? Note that specifying Unicode as the "document character set" (1) specifies what character numbers in numerical character references are to be interpreted as what characters, and (2) specifies what characters (however represented in the system character representation) are legal in the document. The interesting question is: How do you deal with a legal SGML character that your system has no internal representation for? I haven't thought that one through. In some cases, I suspect your entity manager could convert the character as stored in the storage representation into a numeric character reference. But that can break down if the character is used in markup. Hmmm.... ;-) Dave Peterson SGMLWorks! davep@acm.org
Received on Tuesday, 21 January 1997 09:41:18 UTC