W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2001

HTML --tidy--> XHTML

From: Tom Kelly <ctk21@cam.ac.uk>
Date: Thu, 26 Apr 2001 06:52:18 -0400 (EDT)
To: <html-tidy@w3.org>
Message-ID: <Pine.SOL.4.33.0104261126450.28701-100000@orange.csi.cam.ac.uk>

The tidy4aug00 package seems to have a problem cleaning up HTML script
tags and HTML comments when outputting to the XHTML format.

This seemed to come up on this list last year:

But I still have problems with the way tidy deals with comments and script
tags when outputting XHTML from an HTML source.

The HTML 4.01 spec on comments:
Classes two or more adjacent hyphens inside comments as something to be
The XHTML/XML spec:
States that : '.. the string "--" (double-hyphen) must not occur within
comments. '

Hence <!-- dog--cat --> is an illegal comment in XHTML (and something to
be avoided in HTML).

At present tidy just lets that pass through when outputing XHTML... Maybe
it should be cleaned by replacing '--' with '- - ' (or some other suitable
escape sequence) and a warning given.

Script tags:
Due to the way comments are dealt with in XHTML as outlined above. This
can cause problems for script tags which use <!-- *stuff* --> to hide
scripts. As outlined in the XHTML spec:

A script tag might do better in the form:
... unescaped script content (except > becomes &gt; ) ...

Hence when tidying HTML to XHTML, tidy should probably replace:
... unescaped script ...

... unescaped script content (except > becomes &gt; ) ...

This would make the tidy XHTML output more XHTML compliant!

Received on Sunday, 29 April 2001 05:42:06 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:50 UTC