Re: Simple(?) question on obscure comments detail

Fri, 20 Sep 1996 15:49:49 -0500

Date: Fri, 20 Sep 1996 15:49:49 -0500
To: (Arnoud "Galactus" Engelfriet)
From: (Murray Altheim)
Subject: Re: Simple(?) question on obscure comments detail
Cc: (Arnoud "Galactus" Engelfriet) writes:
>In article <v02110102ae677bf46636@[]>,
> (Murray Altheim) wrote:
>> ><!-- hello--->
>> The latter. Check also
>Yes, I've read that. They all state "comments are surrounded by '--'
>and do not contain '--'". I can *not* find anything that explicitly
>states the last character of a comment may not be '-'.
>I suppose most parsers just strtok(NULL, "--") but is that the
>correct behaviour?

The parser locates the start of the comment declaration (MDO), then parses
pairs of comment delimiters (COM = "--") containing valid content (white
space or valid SGML characters). Your example

     <!-- hello--->

parses to:

     "<!"        MDO
     "--"        COM
     " hello"    SGML character* | s (space)
     "--"        COM
     "->"        #### invalid: only s and comment allowed
                               in comment declaration

In your example, the first instance of the second COM occurs right after
"hello", leaving the "->" dangling.

The nsgmls error message means that only s (whitespace) and comments ("--
text --" is an example of a comment) are allowed in a comment declaration.
You can check page 390-391 of Goldfarb's "The SGML Handbook" to confirm.

The parser is scanning forward for the next instance of COM, not for the
next instance of "-->", which has no singular significance in a comment
declaration; it is simply the concatenation of a COM and MDC (">"); that's
why parsers that look for "-->" are making an error. It is perfectly
SGML-legal to write a comment declaration such as:

     <!-- hello --


