Re: XML production [2] vs. SGML declaration: C1 and DEL

The HTML WG has decide to wait for resolution of the issue in XML before
deciding on a course of action.

Steven Pemberton
Chair, HTML WG

----- Original Message -----
From: Martin J. Duerst <duerst@w3.org>
To: James Clark <jjc@jclark.com>
Cc: <w3c-i18n-ig@w3.org>; <w3c-html-wg@w3.org>; <w3c-xml-core-wg@w3.org>;
<xml-editor@w3.org>
Sent: Monday, May 29, 2000 4:32 AM
Subject: XML production [2] vs. SGML declaration: C1 and DEL


> Hello James,
>
> Misha and I just detected a discrepancy between the SGML declaration
> in http://www.w3.org/TR/NOTE-sgml-xml-971215 and production [2] of
> XML http://www.w3.org/TR/REC-xml#NT-Char.
>
> [2]  Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD]
>                | [#x10000-#x10FFFF]
>                /* any Unicode character, excluding the surrogate
>                   blocks, FFFE, and FFFF. */
>
>
>       CHARSET
>             BASESET
>                 "ISO Registration Number 176//CHARSET
>                 ISO/IEC 10646-1:1993 UCS-4 with implementation
>                 level 3//ESC 2/5 2/15 4/6"
>             DESCSET
>                    0       9       UNUSED
>                    9       2       9
>                    11      2       UNUSED
>                    13      1       13
>                    14      18      UNUSED
>                    32      95      32
>                    127     1       UNUSED
>                    128     32      UNUSED
>                    160     55136   160
>                    55296   2048    UNUSED  -- surrogates --
>                    57344   8190    57344
>                    65534   2       UNUSED  -- FFFE and FFFF --
>                    65536   1048576 65536
>
>
> If production [2] is followed exactly, the above would change to
>
>       CHARSET
>             BASESET
>                 "ISO Registration Number 176//CHARSET
>                 ISO/IEC 10646-1:1993 UCS-4 with implementation
>                 level 3//ESC 2/5 2/15 4/6"
>             DESCSET
>                    0       9       UNUSED
>                    9       2       9
>                    11      2       UNUSED
>                    13      1       13
>                    14      18      UNUSED
>                    32      55264   32
>                    55296   2048    UNUSED  -- surrogates --
>                    57344   8190    57344
>                    65534   2       UNUSED  -- FFFE and FFFF --
>                    65536   1048576 65536
>
> The difference is that the C1 control region and DEL are allowed.
> Why this difference? Do you think it is an error in XML? Because
> XML is normative, it probably means that the SGML declaration
> should be fixed.
>
> This mail is copied to the HTML WG because they provide an SGML
> declaration for XHTML. In their case, it could be argued that
> because HTML 4.0 is more restrictive, they do not have to change,
> but such a decision should be made explicitly and should be
> documented.
>
>
> Regards,   Martin.
>
>

Received on Wednesday, 7 June 2000 09:56:27 UTC