W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2000

RE: clean XHTML : what's new?

From: Richard A. O'Keefe <ok@atlas.otago.ac.nz>
Date: Mon, 18 Dec 2000 12:01:14 +1300 (NZDT)
Message-Id: <200012172301.MAA06239@atlas.otago.ac.nz>
To: derhoermi@gmx.net, html-tidy@w3.org, philippe.barthelemy@bdesign.fr
	Examples of corections: enclosed script content in CDATA, adding a white
	space in empty comments 
		( ie : from <----> to <-- --> ) .
Presumably	      from <!----> to <!-- -->
	
That most certainly should not be necessary.

The definition of a comment in SGML is basically
	/<!(--([^-]|-[^-])*--\s*)*>/
In particular, <!> is a comment, <!--foo-- --bar--> is a comment,
and <!----> is a comment.  XML is supposed to make precisely two
changes to this:
 (a) there must be exactly one chunk
 (b) trailing space is not allowed after that chunk,
So the definition of a comment in XML is
	/<!--([^-]|-[^-])*-->/

	I do understand that a lots a w3c specs implementations are
	rather fuzzy.  I do not want to argue about it.  ( especially,
	because I am likely to be wrong quite often...)

"Rather fuzzy" is an extremely generous way of describing quite a
few of them; the people who worked out how to formally specify software
appear to have lived in vain.

But in the case of XML comments, there is not the slightest fuzziness.
Here is the actual rule:
[15]	Comment = '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'

There is admittedly a note:
[E27]Note that the grammar does not allow a comemnt ending in --->.
This is *not* an additional constraint; it is supposed to be explaining
what the *grammar* already forbids.  As it stands, it is wrong, but the
associated example makes it clear that the author of the note did not
have <!----> in mind.

Nor, for that matter, is there anything in XHTML that requires a
script to have its content enclosed in CDATA.
There *is* advice, appendix C.4, that says

	C.4 Embedded Style Sheets and Scripts.
	Use external style sheets if your style sheet uses < or & or ]]>
	or --. Use external scripts if your script uses < or & or ]]> or
	--. Note that XML parsers are permitted to silently remove the
	contents of comments.  Therefore, the historical practice of
	"hiding" scripts and style sheets within comments to make the
	documents backward compatible is likely to not work as expected
	in XML-based implementations.

Section 4.8 says that you *MAY* use CDATA, not that you *must*:

	4.8 Script and Style elements

	In XHTML, the script and style elements are declared as having
	#PCDATA content.  As a result, < and & will be treated as the
	start of markup, and entities such as &lt; and &amp; will be
	recognized as entity references by the XML processor to < and &
	respectively.  Wrapping the content of the script or style
	element within a CDATA marked section avoids the expansion of
	these entities.

	<script> <![CDATA[ ... unescaped script content ... ]]>
	</script>

	CDATA sections are recognized by the XML processor and appear as
	nodes in the Document Object Model, see Section 1.3 of the DOM
	Level 1 Recommendation [DOM].

	An alternative is to use external script and style documents.

Since most browsers out there do NOT support XHTML, I wonder whether
using external scripts would not be the simplest device?

There is a real incompatibility between HTML and XHTML here.
In HTML, we have
	<!ENTITY % Script "CDATA">
	<!ELEMENT SCRIPT - - %Script;>
which says that the *only* significant markup within a <SCRIPT>
is the </ that closes it.  But in XHTML, since XML does not support
CDATA elements, we have
	<!ELEMENT SCRIPT (#PCDATA)>
which says that markup *may* occur inside a <SCRIPT>.
External scripts have got to be *the* simplest way to work around this!
Received on Sunday, 17 December 2000 18:01:37 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:44 GMT