Re: Cougar DTD: Do not use CDATA declared content for SCRIPT

Arne Knudson <ack@ebt.com> wrote:
>      There was a proposal a while back to use CDATA marked sections.
> Briefly, marked sections are denoted as in the following example:
> 
> <![CDATA[this is <foo> text that &bar; should not get parsed]]>

[explication elided]

>      This solves the problem of the end-tags, because the SGML parser
> ignores everything up to the "]]>". It introduces a whole new can of worms,
> though, because we're forcing the authors to put "<![CDATA[...]]>" inside of
> every SCRIPT element to protect the content of the script, which means the
> scripting parser will have to know to discard it.

Not in a sensible implementation...

In a structure-controlled SGML implementation, the application
never sees the "<![ CDATA ["  and "]]>" markup; these get swallowed
by the parser, which would hand the content of the SCRIPT element
to the application unscathed.  The application would then pass it
to an appropriate script interpreter based on the value of the  
LANGUAGE attribute.


> Furthermore, if the
> programmer should want to (God forbid) put a "]]>" into the document,
> they're just as screwed as they were back when they were trying to put
> end-tags into CDATA elements.

On the contrary, this is possible with marked sections 
(it's just awkward):


    <SCRIPT><![ CDATA [
	whatever.whichever("Here goes nothing: ]]>]]&gt;<![ CDATA [");
    ]]></SCRIPT>

which will yield:

	whatever.whichever("Here goes nothing: ]]>");
	_______________________________________^^&___

as the content of the script element.  (The underlined
parts were parsed as part of a CDATA marked section; the 
undersharkfinned parts were entered as character data
outside any marked section, and the underampersanded
part was entered as an entity reference.)

This works because marked section boundaries are independant
of element boundaries; this is the chief advantage of CDATA
marked sections over elements with CDATA declared content.

Of course in this case the author would probably just type:

    <SCRIPT>
	whatever.whichever("Here goes nothing: ]]&gt;");
    </SCRIPT>

instead, since there only thing inside the element that
might be mistaken for markup is the sequence  ]]>  . 
Again, this presumes a sensible implementation in which 
the SGML parser is decoupled from the JavaScript parser.


--Joe English

  joe@art.com

Received on Friday, 26 July 1996 17:52:40 UTC