odd behaviour with </script>

Hi,

I came across an odd behaviour of tidy which might qualify as a bug.
The following HTML file looks unusual, but is valid HTML in my
understanding (ok, it lacks the type attribute, but that's not the point
here):

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html
><head
><title
>Test</title
><script language="JScript"
></script
></head
><body
></body
></html>

If I run this through tidy, I see a boatload of warnings:

Tidy (vers 4th August 2000) Parsing "test.html"
line 6 column 2 - Warning: <script> lacks "type" attribute
line 8 column -1 - Warning: '<' + '/' + letter not allowed here
line 8 column 1 - Warning: '<' + '/' + letter not allowed here
line 9 column -1 - Warning: '<' + '/' + letter not allowed here
line 10 column 1 - Warning: '<' + '/' + letter not allowed here
line 11 column -1 - Warning: '<' + '/' + letter not allowed here
line 11 column 1 - Warning: '<' + '/' + letter not allowed here
line 11 column 6 - Warning: '<' + '/' + letter not allowed here
line 11 column 6 - Warning: missing </script>

If I shift one chevron up one line (the line "></script" becomes
"></script>"):

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html
><head
><title
>Test</title
><script language="JScript"
></script>
</head
><body
></body
></html>

then all those warnings are gone (except the type attribute).

I have seen some comments in the archive about the special treatment of the
</script> end tag, but this should not apply here. To me it looks like tidy
accepts the script CDATA as correctly terminated when it finds "</script>".
I think it should accept it when it finds "</script" (zero or more
whitespace) ">". This would be consistent with the termination of all other
elements.

The background of this somewhat odd question is (Open)Jade. This tool can
generate HTML output from SGML sources. It creates linebreaks like in the
examples above, as these are the only locations for linebreaks where Jade
can be sure not to screw up the contents. I am currently writing DSSSL
stylesheets that insert JavaScript/JScript stuff into the generated HTML
pages. Both Lynx and tidy cannot handle the automatically generated HTML
properly, while IE and Netscape are fine.

Please let me know if I'm off track here.

regards,
Markus
hoenicka_markus@compuserve.com

Received on Saturday, 21 October 2000 01:36:41 UTC