- From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
- Date: Wed, 11 Dec 96 13:08:27 CST
- To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
At SGML '96, there was a fair amount of discussion of aspects of 8879 which had forced XML into uncomfortable (or in some views unnatural) positions on this issue or that, and some members of WG8 have indicated an interest in issuing a technical corrigendum to the standard, to change those aspects of SGML, at least in the cases where it's clear and widely agreed what should change and how. In discussing this possibility with members of WG8, the ERB has encountered a question on which we need the guidance of the WG, namely, what to do about the syntax of comments in SGML. The XML spec of 14 November defines as a 'comment' what in 8879 is called a 'comment declaration' -- to reduce confusion, in what follows I am going to try to use the 8879 terminology, not the XML terminology. The current version of the XML spec differs from 8879 in various ways: (a) XML allows comments *only* in comment declarations; 8879 allows them in other markup declarations as well (though not in all locations) (b) XML allows exactly one comment in a comment declaration; 8879 allows zero or more (c) XML defines two delimiters '<!--' and '-->' which bound the construct in question; 8879 sees three delimiter roles involved here, and allows white space in some locations: XML '<!--' corresponds to 'mdo, com' in 8879 (this is split across productions 91 and 92, pp. 391 of the Handbook) and XML '-->' corresponds to 'com, s*, mdc' (again, productions 91-92) N.B. I think this list is complete but may be wrong. The main perceived problem with the XML spec is that the comment itself is barred not only from containing '-->' (its closing delimiter) but '--'. Changing the SGML com delimiter from '--' to something else (e.g. ';;') would change the forbidden string to a less frequently encountered one, and was considered, but it would not eliminate the apparent irrationality and was felt in any case to be unwise. A technical corrigendum to 8879 would ideally eliminate this problem and make it possible for '--' to appear within XML-style comment declarations; it might also make it possible for SGML parsers to enforce XML's rules on (a) number of comments within the comment declaration and (b) absence of white space before the closing mdc. So far so good. The problem is that there seem to be at least two ways of approaching the problem, and it's not clear which is preferable. Your opinions, please. A. The Simple Comment This proposal adds a SIMPLEC (simple-comment) optional feature to the SGML declaration. If SIMPLEC NO is declared, comments behave as they do now. If SIMPLEC YES is declared, comments have an alternative definition. The relevant clause might read like this (thanks to Dave Peterson for the draftsmanship). I have marked additions in <add>...</add> and substitutions in <sub>...</sub>: 10.3 Comment declaration <add>[91] comment declaration = normal comment declaration | simple comment declaration </add> <sub>[91a]</sub> <add>normal</add> comment declaration = mdo, (comment, (s | comment)*)?, mdc [92] comment = como, SGML character*, comc <add> [92a] simple comment declaration = mdo, com, SGML character*, com, mdc </add> No markup is recognized in a comment, other than the com delimiter that terminates it. <add>No markup is recognized in a simple comment declaration other than the com delimiter immediately followed by an mdc delimiter that terminates it.</add> <add> NOTES 1. A com delimiter not followed by an mdc delimiter will be recognized in a comment (in a comment declaration or other declaration) but not in a simple comment declaration. 2. The SGML declaration specifies whether normal or simple comment declarations are used in a document. No document may use both. </add> Advantages: captures all of XML's rules except the prohibition on comments in other markup declarations. Disadvantages: not clear whether the precise mix of simplifications undertaken by XML is of general enough interest / use to warrant this approach: would other application profiles prefer to impose different rules on comments? B. Splitting the com delimiter. This proposal simply replaces the 8879 com delimiter with a pair of delimiters, como and comc (comment open and comment close). Documents using 8879:1986 syntax have como = comc = com; in the RCS that's como = comc = '--'. XML could allow -- within comments by setting como to '--*' and comc to '*--', to retain the general look and feel of current comments, and still allow '--' in the comment itself. Nested comments might also become possible, in SGML (and then in XML), or not. Production 91 of 8879 could remaing the same as it now is; 92 would change. We might have: 91 comment declaration = mdo, (comment, (s | comment)*)?, mdc 92 comment = como, SGML character*, comc or (to allow nested comments) 92 comment = como, (SGML character | comment)*, comc Note: If this is what we propose to WG8, the XML spec should probably change *now* to use these delimiters, replacing production 21 with [21] Comment ::= '<!--*' [µ-]* ('-' [µ-]+)* '*-->' Advantages: this seems relatively simple and relatively compatible with the look and feel of 8879 as a whole. Disadvantages: it doesn't allow the SGML parser to enforce XML's rules. What does the SGML Work Group think about this problem? -C. M. Sperberg-McQueen
Received on Wednesday, 11 December 1996 14:55:44 UTC