- From: Terry Allen <tallen@fsc.fujitsu.com>
- Date: Sun, 1 Dec 1996 09:21:59 -0800 (PST)
- To: w3c-sgml-wg@w3.org
Further to the task of figuring out what characters are allowed where in XML, I've put together an abstract version of part of an SGML decl, using the EBNF productions in the XML spec instead of writing out character ranges. Two issues that arise are: 1. is XML using Unicode or 10646 2. is the document character set limited to assigned code points Comment invited. <!SGML "ISO 8879:1986" -- this is a sketch of part of an SGML decl using keywords from XML 1.0 -- -- it is not processable -- CHARSET BASESET "ISO/IEC 10646-1:1993//CHARSET etc ... UCS Part 1 etc" -- XML 1.2 says this is technically identical with Unicode 2.0; is that really true and in what sense? Doesn't Unicode specify only the BMP of 10646? The Unicode 2.0 spec says the two are "code-for-code identical"; is that the same as "technically identical"? The Unicode 2.0 spec says it's a 16-bit encoding (p. 1-1); XML [2] refers to "ISO 10646 31-bit code" -- DESCSET [2] Character -- or [11] NameChar ? -- [ ... ] SYNTAX [ ... ] BASESET "ISO/IEC 10646-1:1993//CHARSET etc ... UCS Part 1 etc" DESCSET [11] NameChar FUNCTION [ ... ] SPACE [1] S TAB SEPCHAR -- ? -- NAMING -- ? -- DELIM [ as in the RCS -- ? -- ] GENERAL SGMLREF SHORTREF SGMLREF NAMES SGMLREF [ also as in the RCS -- ? -- ] Regards, Terry Allen Fujitsu Software Corp. tallen@fsc.fujitsu.com "In going on with these experiments, how many pretty systems do we build, which we soon find outselves obliged to destroy?" - Benjamin Franklin A Davenport Group Sponsor: http://www.ora.com/davenport/index.html
Received on Sunday, 1 December 1996 12:23:05 UTC