simplifying comments in SGML '97
At SGML '96, there was a fair amount of discussion of aspects of 8879
which had forced XML into uncomfortable (or in some views unnatural)
positions on this issue or that, and some members of WG8 have
indicated an interest in issuing a technical corrigendum to the
standard, to change those aspects of SGML, at least in the cases
where it's clear and widely agreed what should change and how.
In discussing this possibility with members of WG8, the ERB has
encountered a question on which we need the guidance of the WG,
namely, what to do about the syntax of comments in SGML.
The XML spec of 14 November defines as a 'comment' what in 8879 is
called a 'comment declaration' -- to reduce confusion, in what
follows I am going to try to use the 8879 terminology, not the XML
The current version of the XML spec differs from 8879 in various ways:
(a) XML allows comments *only* in comment declarations; 8879
allows them in other markup declarations as well (though not
in all locations)
(b) XML allows exactly one comment in a comment declaration;
8879 allows zero or more
(c) XML defines two delimiters '<!--' and '-->' which bound the
construct in question; 8879 sees three delimiter roles involved
here, and allows white space in some locations: XML '<!--'
corresponds to 'mdo, com' in 8879 (this is split across
productions 91 and 92, pp. 391 of the Handbook) and XML
'-->' corresponds to 'com, s*, mdc' (again, productions 91-92)
N.B. I think this list is complete but may be wrong.
The main perceived problem with the XML spec is that the comment itself
is barred not only from containing '-->' (its closing delimiter) but
'--'. Changing the SGML com delimiter from '--' to something else
(e.g. ';;') would change the forbidden string to a less frequently
encountered one, and was considered, but it would not eliminate the
apparent irrationality and was felt in any case to be unwise.
A technical corrigendum to 8879 would ideally eliminate this problem
and make it possible for '--' to appear within XML-style comment
declarations; it might also make it possible for SGML parsers to
enforce XML's rules on (a) number of comments within the comment
declaration and (b) absence of white space before the closing mdc.
So far so good.
The problem is that there seem to be at least two ways of approaching
the problem, and it's not clear which is preferable. Your opinions,
A. The Simple Comment
This proposal adds a SIMPLEC (simple-comment) optional feature to
the SGML declaration. If SIMPLEC NO is declared, comments behave
as they do now. If SIMPLEC YES is declared, comments have an
alternative definition. The relevant clause might read like this
(thanks to Dave Peterson for the draftsmanship). I have marked
additions in <add>...</add> and substitutions in <sub>...</sub>:
10.3 Comment declaration
<add> comment declaration = normal comment declaration
| simple comment declaration
<sub>[91a]</sub> <add>normal</add> comment declaration =
mdo, (comment, (s | comment)*)?, mdc
 comment = como, SGML character*, comc
[92a] simple comment declaration =
mdo, com, SGML character*, com, mdc
No markup is recognized in a comment, other than the com
delimiter that terminates it. <add>No markup is recognized in a
simple comment declaration other than the com delimiter
immediately followed by an mdc delimiter that terminates
1. A com delimiter not followed by an mdc delimiter will be
recognized in a comment (in a comment declaration or other
declaration) but not in a simple comment declaration.
2. The SGML declaration specifies whether normal or simple
comment declarations are used in a document. No document may
Advantages: captures all of XML's rules except the prohibition on
comments in other markup declarations. Disadvantages: not clear
whether the precise mix of simplifications undertaken by XML is of
general enough interest / use to warrant this approach: would other
application profiles prefer to impose different rules on comments?
B. Splitting the com delimiter.
This proposal simply replaces the 8879 com delimiter with a pair of
delimiters, como and comc (comment open and comment close). Documents
using 8879:1986 syntax have como = comc = com; in the RCS that's
como = comc = '--'. XML could allow -- within comments by setting
como to '--*' and comc to '*--', to retain the general look and feel
of current comments, and still allow '--' in the comment itself.
Nested comments might also become possible, in SGML (and then in XML),
Production 91 of 8879 could remaing the same as it now is; 92 would
change. We might have:
91 comment declaration = mdo, (comment, (s | comment)*)?, mdc
92 comment = como, SGML character*, comc
or (to allow nested comments)
92 comment = como, (SGML character | comment)*, comc
Note: If this is what we propose to WG8, the XML spec should probably
change *now* to use these delimiters, replacing production 21 with
 Comment ::= '<!--*' [µ-]* ('-' [µ-]+)* '*-->'
Advantages: this seems relatively simple and relatively compatible
with the look and feel of 8879 as a whole. Disadvantages: it
doesn't allow the SGML parser to enforce XML's rules.
What does the SGML Work Group think about this problem?
-C. M. Sperberg-McQueen