Re: Cougar DTD: Do not use CDATA declared content for SCRIPT

Joe English (joe@trystero.art.com)
Mon, 29 Jul 1996 19:54:38 PDT


Message-Id: <9607300254.AA22380@trystero.art.com>
To: www-html@w3.org, "Paul Prescod" <papresco@calum.csclub.uwaterloo.ca>
Subject: Re: Cougar DTD: Do not use CDATA declared content for SCRIPT 
In-Reply-To: <199607300045.RAA28976@iberia.it.earthlink.net> 
Date: Mon, 29 Jul 1996 19:54:38 PDT
From: Joe English <joe@trystero.art.com>


"David Perrell" <davidp@earthlink.net> wrote:

> I read in another message that--except in marked sections--a parser is
> expected to end an element when it encounters the corresponding ETAGO.
> That's not strictly true. With CDATA declared content, the recognition
> of ETAGO is further constrained to occur only when immediately followed
> by an SGML name start character.


Sort of, but not quite.

ETAGO is a _delimiter-in-context_, which means that it is only 
recognized *at all* when it is followed by a name start character
(or, in some cases, a GRPO delimiter).  This is true for all cases
where ETAGO is recognized, not just in CDATA declared content.


In _element content_ and _mixed content_, the ETAGO delimiter-in-context
signals the beginning of an end-tag (which must follow the ETAGO d-i-c).
The end-tag causes the parser to close the currently open element (or 
more than one currently open element in the case of end-tag omission).

In CDATA and RCDATA _declared content_, the same thing happens,
except here the ETAGO delimiter-in-context *also* signals the end
of the declared content.  Note that the end of the declared content
must of necessity coincide with the end of the element that
introduced it.

If this sounds screwy and irrational, that's because it is:
CDATA and RCDATA declared content are broken.

In CDATA and RCDATA marked sections, ETAGO is simply not recognized
as markup.


> Apply a little more
> constraint--substitute "SGML name start character" with "the element
> name."


That might (or might not) be a more sensible approach, 
but it's not what SGML does. 

You can't change what SGML does without changing ISO 8879, and (for
better or worse) you can't change ISO 8879 in a way that will change
the meaning or legality of any currently valid SGML document [1].
I'm afraid that's exactly what this change would do:

<!DOCTYPE TEST [
    <!ELEMENT TEST - - (A*)	>
    <!ELEMENT A - - (#PCDATA|B)*>
    <!ELEMENT B - O CDATA>
]>
<TEST>
<A>This document is legal according to the ISO 8879:1986,
<B>but it would be illegal under the proposed scheme.</A>
</TEST>


--Joe English

  joe@art.com

[1] <URL: ftp://ftp.ornl.gov/pub/sgml/wg8/document/1289.htm >