Re: Tidy of Script contents desc += "</td></table>"; from Richard A. O'Keefe on 2000-09-05 (html-tidy@w3.org from July to September 2000)

From: Richard A. O'Keefe <ok@atlas.otago.ac.nz>
Date: Tue, 5 Sep 2000 16:32:30 +1200 (NZST)
To: deltab@osian.net, h.rzepa@ic.ac.uk
Cc: html-tidy@w3.org
Message-Id: <200009050432.QAA06579@atlas.otago.ac.nz>

	
	| Although the STYLE and SCRIPT elements use CDATA for their data model, for
	| these elements, CDATA must be handled differently by user agents. Markup and
	| entities must be treated as raw text and passed to the application as is. The
	| first occurrence of the character sequence "</" (end-tag open delimiter) is
	| treated as terminating the end of the element's content. In valid documents,
	| this would be the end tag for the element.
	
From the HTML4 DTD:
	<!ENTITY % Script "CDATA">
	<!ENTITY % StyleSheet "CDATA">
	<!ELEMENT STYLE  - - %StyleSheet>
	<!ELEMENT SCRIPT - - %Script>

There are two important things to know about elements with CDATA content:
(1) They are *supposed* to be terminated by *any* end-tag.
    (Not as useful as it could have been, as this example shows, but
    that _is_ what the standard says.)  If you have
    	<script><foo>xxx</foo></script>
    then the script *must* be taken as ending just before </foo>,
    and then the parser will discover that </foo> isn't actually
    allowed them.

(2) Markup and entities *cannot* occur inside them.  The point of CDATA
    is to have < (other than </) and & occurring as plain data with no
    special reading at all.

The solution for Javascript is easy.  Any time you would have had
e.g. "</td>", write "<" + "/td>" instead.    


	Tidy's behaviour is therefore correct for HTML4.01 documents. (Though I
	expect XHTML is stricter, and requires CDATA element content to use &lt;/
	instead.)
	
To start with, &lt; inside an element with CDATA content means
ampersand, leower case l, lower case t, semicolon.  It never ever
EVER means less than.

XHTML is based on XML.  XML has no CDATA element content models.
XML has CDATA sections instead.

Received on Tuesday, 5 September 2000 00:33:26 UTC