- From: <html-tidy@war-of-the-worlds.org>
- Date: Sun, 25 Jun 2000 21:15:27 -0500
- To: html-tidy@w3.org
"Thomas Appel" <thomas.appel@arcormail.de> wrote: >Dear Sir, > >would you please tell me, why tidy (version of 13th january 2000) changes >the following java script line > >parent.FRAME2.document.write("</HEAD><BODY><H1>" + Titel + >"</H1></BODY></HTML>"); > >to > >parent.FRAME2.document.write("<\/HEAD><BODY><H1>" + Titel + >"<\/H1><\/BODY><\/HTML>"); > >adding a backslash in front of every slash in the html-tags. I could not >find this syntax rule in any of my html-books. Here we go again. SCRIPT content is described as CDATA. The parsing of CDATA content is terminated by an ETAGO (End Tag Open) sequence ("<" + "/" + alphabetic character) as a general SGML rule, otherwise how would you know in a general sense where the CDATA ends and you should look for an end tag? SCRIPTs must only be terminated with </SCRIPT> (starts with an ETAGO, thus the need for general recognition of ETAGO for any CDATA tag, including both SCRIPT and STYLE), so the presence of any other ETAGO sequences is by definition an error. The easiest way to eliminate premature ETAGO sequences is to escape an interior character of the sequence. The slash is a natural choice. Thus the introduction of the backslashes before the slash. Consider what would happen if you had: document.write("</SCRIPT>"); Just as a script would be terminated by the instance of </SCRIPT>, so it would be terminated in a general sense by any ETAGO sequence due to the general SGML rule. A general parser doesn't know anything about the syntax of scripting languages, so won't care that it is in quotes. Tidy's changes are necessary for compliance, and function is not altered.
Received on Sunday, 25 June 2000 22:15:47 UTC