- From: Tom Kelly <ctk21@cam.ac.uk>
- Date: Thu, 26 Apr 2001 06:52:18 -0400 (EDT)
- To: <html-tidy@w3.org>
The tidy4aug00 package seems to have a problem cleaning up HTML script tags and HTML comments when outputting to the XHTML format. This seemed to come up on this list last year: http://lists.w3.org/Archives/Public/html-tidy/2000OctDec/0319.html But I still have problems with the way tidy deals with comments and script tags when outputting XHTML from an HTML source. Comments: The HTML 4.01 spec on comments: http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.4 Classes two or more adjacent hyphens inside comments as something to be avoided. The XHTML/XML spec: http://www.w3.org/TR/1998/REC-xml-19980210#sec-comments States that : '.. the string "--" (double-hyphen) must not occur within comments. ' Hence <!-- dog--cat --> is an illegal comment in XHTML (and something to be avoided in HTML). At present tidy just lets that pass through when outputing XHTML... Maybe it should be cleaned by replacing '--' with '- - ' (or some other suitable escape sequence) and a warning given. Script tags: Due to the way comments are dealt with in XHTML as outlined above. This can cause problems for script tags which use <!-- *stuff* --> to hide scripts. As outlined in the XHTML spec: http://www.w3.org/TR/xhtml1/#diffs A script tag might do better in the form: <script> <![CDATA[ ... unescaped script content (except > becomes > ) ... ]]> </script> Hence when tidying HTML to XHTML, tidy should probably replace: <script><!-- ... unescaped script ... --> </script> with: <script><![CDATA[ ... unescaped script content (except > becomes > ) ... ]]> </script> This would make the tidy XHTML output more XHTML compliant! Tom
Received on Sunday, 29 April 2001 05:42:06 UTC