- From: Martin J. Duerst <duerst@w3.org>
- Date: Mon, 29 May 2000 11:32:31 +0900
- To: James Clark <jjc@jclark.com>
- Cc: w3c-i18n-ig@w3.org, w3c-html-wg@w3.org, w3c-xml-core-wg@w3.org, xml-editor@w3.org
Hello James, Misha and I just detected a discrepancy between the SGML declaration in http://www.w3.org/TR/NOTE-sgml-xml-971215 and production [2] of XML http://www.w3.org/TR/REC-xml#NT-Char. [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ CHARSET BASESET "ISO Registration Number 176//CHARSET ISO/IEC 10646-1:1993 UCS-4 with implementation level 3//ESC 2/5 2/15 4/6" DESCSET 0 9 UNUSED 9 2 9 11 2 UNUSED 13 1 13 14 18 UNUSED 32 95 32 127 1 UNUSED 128 32 UNUSED 160 55136 160 55296 2048 UNUSED -- surrogates -- 57344 8190 57344 65534 2 UNUSED -- FFFE and FFFF -- 65536 1048576 65536 If production [2] is followed exactly, the above would change to CHARSET BASESET "ISO Registration Number 176//CHARSET ISO/IEC 10646-1:1993 UCS-4 with implementation level 3//ESC 2/5 2/15 4/6" DESCSET 0 9 UNUSED 9 2 9 11 2 UNUSED 13 1 13 14 18 UNUSED 32 55264 32 55296 2048 UNUSED -- surrogates -- 57344 8190 57344 65534 2 UNUSED -- FFFE and FFFF -- 65536 1048576 65536 The difference is that the C1 control region and DEL are allowed. Why this difference? Do you think it is an error in XML? Because XML is normative, it probably means that the SGML declaration should be fixed. This mail is copied to the HTML WG because they provide an SGML declaration for XHTML. In their case, it could be argued that because HTML 4.0 is more restrictive, they do not have to change, but such a decision should be made explicitly and should be documented. Regards, Martin.
Received on Sunday, 28 May 2000 22:25:56 UTC