W3C home > Mailing lists > Public > www-validator@w3.org > February 2010

Re: HTML4 + <script><![CDATA[ </ENDTAG> ]]></script>

From: David Dorward <david@dorward.me.uk>
Date: Mon, 1 Feb 2010 07:56:15 +0000
Message-Id: <D3147AAA-50C4-46DF-8045-FAD218B1C787@dorward.me.uk>
To: www-validator@w3.org

On 1 Feb 2010, at 06:19, Leif Halvard Silli wrote:

> The validator doesn't consider the following code as valid HTML4 (HTML 
> four):
> <script type="text/javascript">//<![CDATA[
>   document.write("<aa><bb></bb></aa>");
> //]]></script>

Since <script> elements are defined as containing CDATA, I assume the <![CDATA marker is (supposed to be) treated as character data and not markup. The </ of </bb> is then considered to be an end tag which fails to match the opening <script> tag.

> At the same time, this is considered valid HTML 4 (four) (but invalid 
> HTML5 (five)):
> <p><![CDATA[
> <aa><bb></bb></aa>
> ]]></p>

In HTML 4 the CDATA flag operates as expected (except in browsers, which don't generally support it).

Meanwhile, HTML5 has its own set of parsing rules that are distinct from those of SGML, so I'm not surprised that this isn't allowed. 

> There are 3 reasons why this bug is important to fix:
> (1) That the validator wrongly stamps the first example as invalid 
> creates the impression that it is very difficult to embed javascript in 
> a way that is valid both inside XHTML and inside HTML4. 

It is difficult. The HTML compatibility guidelines for XHTML recommend using external scripts.

> (2) In addition, it is also useful within HTML4! Because: the HTML4 
> specification (as well as the validator) requires that end tags inside 
> the <script> element are escaped - in order to be valid SGML. The HTML4 
> spec gives the following example as example of _one_ way that one can 
> escape the code so that the code is valid SGML both before and after 
> script execution: "<\/b>".


> *However*, the <![CDATA[ ... ]]> syntax for 
> marking up a section where escaping is not necessary is documented in 
> the HTML4 specification as well.

But overruled, I believe, by: "Although the STYLE and SCRIPT elements use CDATA for their data model, for these elements, CDATA must be handled differently by user agents. Markup and entities must be treated as raw text" <http://www.w3.org/TR/html4/types.html#type-cdata>

David Dorward
Received on Monday, 1 February 2010 07:57:07 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:18:00 UTC