Characters in XML from Terry Allen on 1996-12-01 (w3c-sgml-wg@w3.org from December 1996)

From: Terry Allen <tallen@fsc.fujitsu.com>
Date: Sun, 1 Dec 1996 09:21:59 -0800 (PST)
To: w3c-sgml-wg@w3.org
Message-Id: <199612011721.JAA21312@ishtar.fsc.fujitsu.com>

Further to the task of figuring out what characters are
allowed where in XML, I've put together an abstract version
of part of an SGML decl, using the EBNF productions in
the XML spec instead of writing out character ranges.
Two issues that arise are:
  1.  is XML using Unicode or 10646
  2.  is the document character set limited to assigned
      code points

Comment invited.

<!SGML  "ISO 8879:1986"
  -- this is a sketch of part of an SGML decl using keywords from XML 1.0 --
  -- it is not processable --

CHARSET

	BASESET
  "ISO/IEC 10646-1:1993//CHARSET etc ... UCS Part 1 etc"
  -- XML 1.2 says this is technically identical with Unicode 2.0; is
     that really true and in what sense?  Doesn't Unicode specify
     only the BMP of 10646?  The Unicode 2.0 spec says the two are
     "code-for-code identical"; is that the same as "technically
     identical"?  The Unicode 2.0 spec says it's a 16-bit encoding
     (p. 1-1); XML [2] refers to "ISO 10646 31-bit code" --

	DESCSET
                   [2] Character -- or [11] NameChar ? -- 

[ ... ]

SYNTAX

[ ... ]

	BASESET
  "ISO/IEC 10646-1:1993//CHARSET etc ... UCS Part 1 etc"
	DESCSET
                   [11] NameChar 

	FUNCTION 
		[ ... ]
		SPACE       [1] S
		TAB SEPCHAR  -- ? --

	NAMING
		-- ? --

	DELIM   [ as in the RCS -- ? -- ]
		GENERAL  SGMLREF
		SHORTREF SGMLREF

	NAMES SGMLREF  [ also as in the RCS -- ? -- ]

Regards,
    Terry Allen    Fujitsu Software Corp.    tallen@fsc.fujitsu.com
"In going on with these experiments, how many pretty systems do we build,
 which we soon find outselves obliged to destroy?" - Benjamin Franklin
  A Davenport Group Sponsor:  http://www.ora.com/davenport/index.html

Received on Sunday, 1 December 1996 12:23:05 UTC