HTML --tidy--> XHTML

The tidy4aug00 package seems to have a problem cleaning up HTML script
tags and HTML comments when outputting to the XHTML format.

This seemed to come up on this list last year:
http://lists.w3.org/Archives/Public/html-tidy/2000OctDec/0319.html

But I still have problems with the way tidy deals with comments and script
tags when outputting XHTML from an HTML source.

Comments:
The HTML 4.01 spec on comments:
http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.4
Classes two or more adjacent hyphens inside comments as something to be
avoided.
The XHTML/XML spec:
http://www.w3.org/TR/1998/REC-xml-19980210#sec-comments
States that : '.. the string "--" (double-hyphen) must not occur within
comments. '

Hence <!-- dog--cat --> is an illegal comment in XHTML (and something to
be avoided in HTML).

At present tidy just lets that pass through when outputing XHTML... Maybe
it should be cleaned by replacing '--' with '- - ' (or some other suitable
escape sequence) and a warning given.

Script tags:
Due to the way comments are dealt with in XHTML as outlined above. This
can cause problems for script tags which use <!-- *stuff* --> to hide
scripts. As outlined in the XHTML spec:
http://www.w3.org/TR/xhtml1/#diffs

A script tag might do better in the form:
<script>
<![CDATA[
... unescaped script content (except > becomes &gt; ) ...
]]>
</script>

Hence when tidying HTML to XHTML, tidy should probably replace:
<script><!--
... unescaped script ...
-->
</script>

with:
<script><![CDATA[
... unescaped script content (except > becomes &gt; ) ...
]]>
</script>

This would make the tidy XHTML output more XHTML compliant!

Tom

Received on Sunday, 29 April 2001 05:42:06 UTC