Re: HTML-Tidy and JavaScript issues

"Michael Sorens" <wong.sorens@mindspring.com> wrote:

>I just discovered HTML-Tidy--what a tremendous tool! Particularly for the
>capability to upgrade to style sheets which is originally why I looked at
>it. Thank you for creating this!
>
>Below are the warnings generated by HTML-Tidy for a rather small HTML
>file. I suspect most of these are valid warnings due to my lack of
>knowledge, particularly since I ran the file through the WDG validator
>which reported the same issues.
>
>Valid HTML inside a writeln() reported as an error
>=====================================
>Items 1 and 2 are complaining about a piece of valid HTML inside a
>JavaScript function:
>"", "1","index.html","11","46","'<' + '/' + letter not allowed here"
>"", "2","index.html","11","51","'<' + '/' + letter not allowed here"
>Here's the code fragment it refers to, i.e. the </TD> and </TR> inside the
>writeln():
>
><script type="text/javascript">
><!--
>function mainSection () {
>document.writeln('<TR><TD COLSPAN=3 BGCOLOR="#000000" HEIGHT=10>' +
>'<IMG SRC="graphics/onepixel.gif"></TD></TR>'); 
>...

You can't have an ETAGO (End TAG Open) sequence inside a script.  If there
is any instance of "</" followed by a letter (an ETAGO sequence), that
letter MUST be "S", followed by "CRIPT>".  Simply put, the only closing tag
you can have in a SCRIPT is "</SCRIPT>", and it will end the script there.

The solution is to break up the ETAGO sequence in the script.  This is
usually done by escaping the slash with a backslash, i.e. "<\/TD><\/TR>".

Many of your other errors would seem to be the result of this error, as the
</TD>, being an ETAGO sequence, has already terminated the CDATA content of
the SCRIPT, effectively closing the SCRIPT (improperly) as far as the DTD
is concerned.

Perhaps the error message can be made clearer, saying:

	Warning: Illegal end tag in SCRIPT which is not </SCRIPT>: </TD>

with a larger explanation of that error appearing at the end of the error
report explaining the reason why and suggesting a fix.  Or maybe:

	Warning: Missing backslash before slash in </TD> inside SCRIPT

and hope all client-side scripting languages applied to web pages use
backslash to escape characters.

>Item reported missing, but buried in JavaScript
>===================================
>Item 3 is wrong, in that the 'title' element is generated inside the
>support.js file.
>Of course, HTML-Tidy has no way of knowing that, which leads me to believe
>that there are certain things that must be in-line code, not buried within
>JavaScript, e.g. <TITLE>, <LINK>, and others. If I am correct, where would
>this be documented?

TITLE is a special tag in that every HTML document is required to have one.
If it is buried inside a script, the document doesn't have a TITLE tag,
because as the DTD says, it doesn't see one; instead it sees only the
SCRIPT.

It is not reasonable to expect HTML Tidy to execute your Javascript to see
if it generates any tags.  Your document must have a natural TITLE.

>Possible improper placement of <script> tag
>=================================
>The next 9 warnings are all the same issue: I am using a JavaScript
>"subroutine" inside a table;

TABLE cannot contain SCRIPT.  It can only contain (directly or indirectly
due to omissibility) CAPTION, COLGROUP, COL, THEAD, TFOOT, TBODY, TR, INS,
DEL, or comments.

If you're going to use a SCRIPT to generate table rows, the SCRIPT will
have to generate the entire TABLE.

>I gather that this is not a valid thing to do, but I looked all over the
>W3C site and could not find it documented:

Try the DTD itself.  <http://www.w3.org/TR/html40/loose.dtd> is as
forgiving a DTD as you'll find (though HTML Tidy still doesn't support that
particular DTD).

>Possible invalid expression
>====================
>This last set involves the notation "&{JavaScript-expr};" as an attribute
>value, which should be valid.
>"", "7","index.html","61","18","unescaped & which should be written as &amp;"
>"", "8","index.html","75","18","unescaped & which should be written as &amp;"
>"", "12","index.html","148","18","unescaped & which should be written as
>&amp;"
>
><td><FONT SIZE=&{fontSize};>
>
>where "fontSize" is a valid JavaScript variable I've defined. The
>expression works in Netscape 4 but it does not work in Internet Explorer 4
>or 5.

Ampersands ("&"), when not used to form constructs like "&amp;", must be
replaced by "&amp;" in all attribute values.  You must also have quotation
marks around that attribute value.

>Missed catching one error
>=============================
>The WDG HTML validator found one more which HTML-Tidy did not!
>
>Line 41, character 15: <body BGCOLOR=#FFFFFF>
>Error: an attribute value must be quoted if it contains any character
>other than letters (A-Za-z), digits, hyphens, and periods; use quotes if
>in doubt

HTML Tidy quietly fixes these for you.  (It insists all attributes be
quoted, with the reasoning that XML requires all attributes be quoted
regardles of value.)

I'm have a patch that allows them to be omitted where legal for HTML.  I
just want to make it complete so that one can also tell it to always quote
attributes containing CDATA even where no quotes are required (or leaving
off quotes on attributes with keyword values and safely "unquotedable"
numeric values).  There's nothing in the attributes dictionary in HTML Tidy
to identify the type of content for the attributes.)
-- 
         ,=<#)-=#  <http://www.war-of-the-worlds.org/>
    ,_--//--_,
 _-~_-(####)-_~-_  "Did you see that Parkins boy's body in the tunnels?" "Just
(#>_--'~--~`--_<#)  the photos.  Worst thing I've ever seen; kid had no face."

Received on Thursday, 23 December 1999 15:12:01 UTC