Re[2]: Comments in HTML (fwd)

Murray Altheim (murray@spyglass.com)
Tue, 11 Jun 1996 15:38:57 -0500


Message-Id: <v02110111ade3406286db@[140.186.34.50]>
Date: Tue, 11 Jun 1996 15:38:57 -0500
To: kstarsin@smtpgwy.isinet.com
From: murray@spyglass.com (Murray Altheim)
Subject: Re[2]: Comments in HTML (fwd)
Cc: S.N.Brodie@ecs.soton.ac.uk, gleeson@unimelb.edu.au, www-html@w3.org

Kurt Starsinic <kstarsin@smtpgwy.isinet.com> writes:
>Martin Gleeson <gleeson@unimelb.edu.au> writes:
>>Stewart Brodie <S.N.Brodie@ecs.soton.ac.uk> writes:
>>>The problem is that browsers have to terminate comments at the first '>'
>>>beacuse, IIRC, a very early draft of the HTML 2 documentation contained a
>>>misprint and browser authors accepted any > to terminate a comment . . .

I don't know why *anyone* would attempt to support broken behavior from a
misprint in a really outdated draft. Proposed Standard RFC 1866 which
(barring any unforeseen tornadoes) will make HTML 2.0 an IETF Internet
Standard this year, notes:

      NOTE - Some historical HTML implementations incorrectly consider
      any `>' character to be the termination of a comment.

Most current browsers (Spyglass Mosaic, MSIE, Netscape, NCSA Mosaic, etc.)
handle comments rather accurately according to this specification, with a
few exceptions.

>>SGML has always had '-->' as the comment terminator. Browser authors
>>should know enough about what they're doing to know an error as obvious
>>as that when they see it.
>
>Actually, '-->' is _not_ the SGML comment terminator.  Inside '<! ... >',
>'--' brackets a comment on either side; there can be multiple comments
>(along with other SGML) inside '<! ... >'.

Kurt,

[A little clarification, should anyone interpret "other SGML" incorrectly.]

By '<! ... >' you refer to a markup declaration, such as:

   <!ENTITY thorn  CDATA "&#254;" -- small thorn, Icelandic -->

not markup itself:

   <B>This is marked up content.</B>

The only comments

    -- this is a comment --

that can occur in a document instance are those contained within markup
declarations; a special instance of a markup declaration containing only
comments is called a comment declaration. So while

    <!-- this comment is perfectly legal --
      -- and can occur in either a DTD or
         a document instance, -->

you can't use comments within HTML markup in a document instance:

   <A HREF="fooo.html" -- this is an illegal comment -->

But this is a legal comment in DOCTYPE (since it is a markup declaration):

  <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"
       -- this document conforms to the IETF HTML 2.0 DTD -->

and even in marked section declarations in your document instance (which
are legal in HTML but not widely implemented):

  <![ IGNORE    -- INCLUDE this section for external consumption --
  [
      <STRONG>Business Confidential</STRONG>
      <P>If you shouldn't be reading this document, just stop now.</P>
  ]]>

...obviously this is more about comments than any human could want to know
(as was pointed out to me in one reply).

----  AND NOW FOR THE TRULY PERVERSE:  ----

Here's a valid, declaration subset, marked section and comment torture test
document. Problems in some/most browsers:

    1. Given that most HTML browsers don't understand declaration subsets or
       marked sections, many will display the 'Business Confidential' and
       some MS noise. These are legal constructs, but not widely supported.
    2. Few to none will handle the entity declaration correctly. I don't have
       much hope for the &smiley; entity declaration being handled correctly,
       although technically it is allowed. It should include the IMG.
    3. Most will falsely terminate the comment declaration at the occurrence
       of "-->".
    4. The only content that *should* display is one paragraph, and some will
       erroneously comment that out given the existence of the MDO and MDC
       (which should only be interpreted as such when in context -- otherwise,
       they're just data characters).

---- cut here ----

    <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"
      -- this document conforms to the IETF HTML 2.0 DTD --
      [
        <!ENTITY smiley "<IMG SRC='smiley.gif'>" -- smiley -->
    ]>
    <HTML>
    <TITLE>DS, MS &amp; Comment Torture Test</TITLE>
    <BODY>
    <![ IGNORE    -- INCLUDE this section for external consumption --
    [
       <STRONG>Business Confidential</STRONG>
       <P>If you shouldn't be reading this document, stop now.</P>
    ]]>
    <!--  OK.  This is a valid comment.                     --
      --> But what if the comment contains a MDC character? --
      --  Some browsers are broken and will display
          text that they shouldn't here.                    -->
    <P><! This is the only content that should be displayed: &smiley; ></P>
    </BODY>
    </HTML>

---- cut here ----

Hopefully it won't be too long before this document displays correctly on
common HTML browsers. What was once perverse someday becomes habit...
[apologies to the Doobies]

Murray

```````````````````````````````````````````````````````````````````````````````
     Murray Altheim, Program Manager
     Spyglass, Inc., Cambridge, Massachusetts
     email: <mailto:murray@spyglass.com>
     http:  <http://www.stonehand.com/murray/murray.html>