W3C home > Mailing lists > Public > www-html@w3.org > April 2002

Re: Double hyphens ('--') in comments invalid

From: Jan Roland Eriksson <jrexon@newsguy.com>
Date: Tue, 30 Apr 2002 20:07:03 +0200
To: www-html@w3.org
Message-ID: <f9jtcuk4q187l9k5v8e1ef4vco01ugervi@4ax.com>
On Sun, 28 Apr 2002 15:05:56 -0400, you wrote:

>I did some more research; the correct example is below (right?):
>] <!--comment 1--  --comment 2--  --comment 3-- >

Yes. And since you are so very close to the real thing, let me just
add the final touch up here.

>Here are the rules (as I understand them):
>1. A comment block always starts with '<!'.

In SGML terminology a specific SGML comment is just a somewhat special
case of an 'SGML delimited DECLARATION'.

As per the "Reference Concrete Syntax" of SGML the '<!' is a
representation of the SGML abstract 'MDO' (Markup Declaration Open)

>2. Individual comment begins and end with '--'.

The RCS defines '--' to be the representation of the abstract 'COM'
(Comment start or end)

>3. There must not be any white space between the first comment and '<!'
>   ***Hence, that's why the beginning is always '<!--'.***

The rule is that 'MDO' must be immediately followed by something that
comes out as either a valid SGML keyword like e.g. 'DOCTYPE' or
'ELEMENT', or in the special case of a comment only, a 'COM'.

E.g. this would be a more general form of an SGML declaration that
still contains a comment inside it self (still assuming that the RCS
is in effect)...

  <!ELEMENT FOO (BAR)*
    -- an element FOO that takes any no of
       element BAR as its content --
    >

>4. The end of the comment block is denoted by a '>' after the last comment.
>   *** Unlike the block's beginning, the end can be separated by white space.***

That last '>' is the RCS representation of the abstract 'MDC' (Markup
Declaration Close) and yes, it can have arbitrary white space
preceding it self if what came before that is syntactically valid.

>I hope I haven't spread any more misinformation...

Oh no, you are all good and safe, /very/ far ahead of lots of posts I
have seen in comp.text.xml :-)

What ISO and SGML'ers allot has managed to "keep as a secret" to the
general public is that SGML _is_not_ a markup language per se.

SGML is officially defined as a huge list of words that represents an
abstraction of characteristics that can be used to create a specific
markup language.

All of those words in the SGML word list are normally presented in
their abbreviated form as in e.g...

  STAGO  = 'Start TAG Open'
  ETAGO  = 'End TAG Open'
  TAGC   = 'TAG Close'
  MDO    = 'Markup Declaration Open'
  MDC    = 'Markup Declaration Close'

...and the list of available abbreviated words goes on and on, all of
them just "describing" an abstraction of something one might want to
use to create an application to markup bits and pieces of a document
instance.

Somewhere along the line of the definition of SGML it was decided that
"Ok, lets create _one_ 'CONCRETE' example" of how to use these
abstracts to define one way of actually writing a marked up doc
instance, and the SGML "Reference Concrete Syntax" was born.

The "SGML DECLARATION" for the RCS is a part of the ISO standard and
it's in that one we can find the defined connection between abstracts
and reality as in e.g...

  STAGO  = '<'
  ETAGO  = '</'
  TAGC   = '>'
  MDO    = '<!'
  MDC    = '>'

...and it's from the plain existence of the RCS that we can find the
general publics view of markup as being the same as "sprinkled pointy
brackets" throughout a doc instance.

But the fact remains, SGML is an abstraction of mark-up, it is only
the existence of the RCS that has lead people to think that "pointy
brackets" are somewhat 'ennobled' characters in the environment.

Naturally the 'G' in SGML stands for "Generalized", which in effect
means that any one with sufficient energy available could sit down and
write his own 'SGML DECLARATION' where e.g...

  STAGO  = '['
  ETAGO  = '[*'
  TAGC   = ']'
  MDO    = '{%'
  MDC    = '}'

...would be exactly as valid for that particular definition.
It would still be fully valid SGML and nsgmls would be all happy to
parse and validate an instance based on that markup.

(compare to the definition of DSSSL which in it self is a Turing
Complete programming language defined as an application of SGML)

In fact, at least one individual has the energy to spend on inventing
a new method of markup, not necessarily based on SGML, but the basic
thinking of an abstraction connected to an implementation in reality
is still there in all its glory.

  http://www.cs.tut.fi/~jkorpela/data/utd.html


-- 
Rex [the fox in the chicken shack]
Received on Tuesday, 30 April 2002 14:11:31 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:15:51 GMT