- From: Martin J. Duerst <duerst@w3.org>
- Date: Mon, 29 May 2000 11:32:31 +0900
- To: James Clark <jjc@jclark.com>
- Cc: w3c-i18n-ig@w3.org, w3c-html-wg@w3.org, w3c-xml-core-wg@w3.org, xml-editor@w3.org
Hello James,
Misha and I just detected a discrepancy between the SGML declaration
in http://www.w3.org/TR/NOTE-sgml-xml-971215 and production [2] of
XML http://www.w3.org/TR/REC-xml#NT-Char.
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD]
| [#x10000-#x10FFFF]
/* any Unicode character, excluding the surrogate
blocks, FFFE, and FFFF. */
CHARSET
BASESET
"ISO Registration Number 176//CHARSET
ISO/IEC 10646-1:1993 UCS-4 with implementation
level 3//ESC 2/5 2/15 4/6"
DESCSET
0 9 UNUSED
9 2 9
11 2 UNUSED
13 1 13
14 18 UNUSED
32 95 32
127 1 UNUSED
128 32 UNUSED
160 55136 160
55296 2048 UNUSED -- surrogates --
57344 8190 57344
65534 2 UNUSED -- FFFE and FFFF --
65536 1048576 65536
If production [2] is followed exactly, the above would change to
CHARSET
BASESET
"ISO Registration Number 176//CHARSET
ISO/IEC 10646-1:1993 UCS-4 with implementation
level 3//ESC 2/5 2/15 4/6"
DESCSET
0 9 UNUSED
9 2 9
11 2 UNUSED
13 1 13
14 18 UNUSED
32 55264 32
55296 2048 UNUSED -- surrogates --
57344 8190 57344
65534 2 UNUSED -- FFFE and FFFF --
65536 1048576 65536
The difference is that the C1 control region and DEL are allowed.
Why this difference? Do you think it is an error in XML? Because
XML is normative, it probably means that the SGML declaration
should be fixed.
This mail is copied to the HTML WG because they provide an SGML
declaration for XHTML. In their case, it could be argued that
because HTML 4.0 is more restrictive, they do not have to change,
but such a decision should be made explicitly and should be
documented.
Regards, Martin.
Received on Sunday, 28 May 2000 22:25:56 UTC