Re: Cougar DTD: Do not use CDATA declared content for SCRIPT from Christopher R. Maden on 1996-07-26 (www-html@w3.org from July 1996)

From: Christopher R. Maden <crm@ebt.com>
Date: Fri, 26 Jul 1996 22:09:02 GMT
To: joe@trystero.art.com
CC: www-html@w3.org
Message-Id: <199607262209.WAA04231@phaser.EBT.COM>

Joe English:
> Not in a sensible implementation...

Ah, but that's the key, isn't it?

We *must* keep in mind (or else the work of the W3C has little
relevance to its members) that the HTML must be parseable by SGML
*and* heuristic parsers.

If every HTML parser were SGML-based, our problems would be trivial.
Users sufficiently sophisticated could write their own DTDs, declare
their own entities, etc.

> In a structure-controlled SGML implementation, the application never
> sees the "<![ CDATA [" and "]]>" markup; these get swallowed by the
> parser, which would hand the content of the SCRIPT element to the
> application unscathed.  The application would then pass it to an
> appropriate script interpreter based on the value of the LANGUAGE
> attribute.
[...]
>     <SCRIPT><![ CDATA [
> 	whatever.whichever("Here goes nothing: ]]>]]&gt;<![ CDATA [");
>     ]]></SCRIPT>
> 
> which will yield:
> 
> 	whatever.whichever("Here goes nothing: ]]>");
> 	_______________________________________^^&___

It's true, a heuristic parser could be trained to discard marked
section boundaries before feeding the contents to any client
processor.  But I think you know how likely that is from manufacturers
that had scripts in comments...

-Chris
-- 
<!NOTATION SGML.Geek PUBLIC "-//GCA//NOTATION SGML Geek//EN">
<!ENTITY crism PUBLIC "-//EBT//NONSGML Christopher R. Maden//EN" SYSTEM
"<URL>http://www.ebt.com <TEL>+1.401.421.9550 <FAX>+1.401.521.2030
<USMAIL>One Richmond Square, Providence, RI 02906 USA" NDATA SGML.Geek>

Received on Friday, 26 July 1996 18:16:40 UTC