Re: XHTML Invalidity / WML2 / New XHTML 1.1 Attribute

On Tue, 15 Aug 2000 16:34:22 +0100 (BST), Stephanos Piperoglou
<stephanos@webreference.com> wrote:

>On Tue, 15 Aug 2000, Cavre wrote:
>> Not sure I understand why this would be such a problem.  Older
>> browsers ignore Javascript markup via <!-- {ok so that is more a
>> hack than a actual implantation, but it does work} There must be
>> something here that I must learn., but it seems to me that a parser
>> would ignore any content not found outside of it's framework.
>
>AFAIK it's not a hack...
[...]
>I'm not an SGML expert but AFAIK the contents of both SCRIPT and STYLE
>elements are CDATA (as opposed to PCDATA)

Correct, if the parser understands about those elements to start with.

>which means they're not parsed for much other that "</" (which is why
>"</" should never appear in inline scripts and style sheets).

Correct for the same condition.

>Hence the entire contents of the SCRIPT element are passed verbatim
>to the JavaScript (or whatever) parser, which recognizes "<!--" and
>"-->" and throws them out.

A CSS parser throws them both out yes, but not JavaScript parsers.

To a JS parser the string '-->' actually has a meaning and to avoid it
being interpreted as a part of the script we need to JS comment it out.

That's the _sole_ reason for why we see this '//-->' construct at the
end of JS code, on a line of its own just before </SCRIPT> where the
'//' says that the rest of the line is a comment to the JS engine.

This has over time lead to some rather "fun" actions among people that
has not understood the concept. E.g. the popular German "SelfHTML" site
is famous for a totally bogus description of SGML type comments based on
a complete misunderstanding of this pure JS thing :)

>I.e.,in HTML4-compliant browsers, the CDO and CDC delimiters...

Just to clarify. In SGML there are no 'delimiters' named CDO and CDC.
Those Acronyms where invented by CSS designers, to the best of my
knowledge.

The names chosen are also misleading at best since they seem to indicate
that there is something that looks like this '<!--' that is named
"Comment Declaration Open" and '-->' "Comment Declaration Close"

From an SGML standpoint those names are both wrong.

An SGML comment is a _part_of_ an SGML Markup Declaration.

An SGML Markup Declaration Opens with MDO == '<!'

MDO must be directly followed by e.g...
a recognized SGML keyword as in 'ELEMENT', or...
a "Declaration Subset Open" as in DSO == '[', or...
a "Comment Start or End" as in COM == '--'

So this '<!--' is actually parsed as MDO+COM

Anything from now is commentary data up til the next occurrence of COM
where interpretation of data as being commentary stops. And eventually
after an allowed arbitrary sequence of white space the Markup
Declaration itself may be Closed by an MDC == '>'

So...

--comment ends here----------------------------||
--comment starts here-----vv                   vv
                        <!-- this is a comment --
                          -- and another one   --
                        >
--declaration ends here ^

Now lets move on to what may happen in e.g. HTML...

This would be totally correct from an SGML point of view...

  <head>
    <title>some title</title>
    <style type="text/css">
    <!--
      /* style rules here */
      --
    >
    </style>
  </head>

An SGML compliant parser that does _not_ understand about the style
element would ignore those start and end tags for the style element and
end up with a fully valid SGML declaration that only contains commentary
data, and white space outside of that data in locations where it is
allowed.

The situation gets different if we have some parsers along the line that
actually understands about the style element.

First all content of the style element, including what now only looks
like an SGML declaration (but which isn't, it's CDATA right?), will be
treated as CDATA by an SGML compliant parser, and left as is. A CSS
parser OTOH will find an SGML type COM sitting at the end of its data
stream and might barf on that.

The CSS grammar _only_ allows for the CSS defined CDO == '<!--' and
CDC == '-->' to exist in a stylesheet in this situation, and only for
the purpose of being totally ignored by a CSS parser. Anything in
between them will still be interpreted by the CSS parser as parts of
style rules being defined.

Formally CDO and CDC does not exist in SGML but they have been defined
by CSS designers to be char strings equal to what is used in SGML as the
most compact form of an SGML declaration that only contains a comment.
For a specific purpose of course, workaround for some old browsers.

-- 
Jan Roland Eriksson <jrexon@newsguy.com>
<URL:http://member.newsguy.com/%7Ejrexon/>

Received on Tuesday, 15 August 2000 14:20:33 UTC