W3C home > Mailing lists > Public > xml-editor@w3.org > April to June 2000

XML production [2] vs. SGML declaration: C1 and DEL

From: Martin J. Duerst <duerst@w3.org>
Date: Mon, 29 May 2000 11:32:31 +0900
Message-Id: <4.2.0.58.J.20000529111824.03154d10@sh.w3.mag.keio.ac.jp>
To: James Clark <jjc@jclark.com>
Cc: w3c-i18n-ig@w3.org, w3c-html-wg@w3.org, w3c-xml-core-wg@w3.org, xml-editor@w3.org
Hello James,

Misha and I just detected a discrepancy between the SGML declaration
in http://www.w3.org/TR/NOTE-sgml-xml-971215 and production [2] of
XML http://www.w3.org/TR/REC-xml#NT-Char.

[2]  Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD]
               | [#x10000-#x10FFFF]
               /* any Unicode character, excluding the surrogate
                  blocks, FFFE, and FFFF. */


      CHARSET
            BASESET
                "ISO Registration Number 176//CHARSET
                ISO/IEC 10646-1:1993 UCS-4 with implementation
                level 3//ESC 2/5 2/15 4/6"
            DESCSET
                   0       9       UNUSED
                   9       2       9
                   11      2       UNUSED
                   13      1       13
                   14      18      UNUSED
                   32      95      32
                   127     1       UNUSED
                   128     32      UNUSED
                   160     55136   160
                   55296   2048    UNUSED  -- surrogates --
                   57344   8190    57344
                   65534   2       UNUSED  -- FFFE and FFFF --
                   65536   1048576 65536


If production [2] is followed exactly, the above would change to

      CHARSET
            BASESET
                "ISO Registration Number 176//CHARSET
                ISO/IEC 10646-1:1993 UCS-4 with implementation
                level 3//ESC 2/5 2/15 4/6"
            DESCSET
                   0       9       UNUSED
                   9       2       9
                   11      2       UNUSED
                   13      1       13
                   14      18      UNUSED
                   32      55264   32
                   55296   2048    UNUSED  -- surrogates --
                   57344   8190    57344
                   65534   2       UNUSED  -- FFFE and FFFF --
                   65536   1048576 65536

The difference is that the C1 control region and DEL are allowed.
Why this difference? Do you think it is an error in XML? Because
XML is normative, it probably means that the SGML declaration
should be fixed.

This mail is copied to the HTML WG because they provide an SGML
declaration for XHTML. In their case, it could be argued that
because HTML 4.0 is more restrictive, they do not have to change,
but such a decision should be made explicitly and should be
documented.


Regards,   Martin.
Received on Sunday, 28 May 2000 22:25:56 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:59:30 GMT