javascript with jtidy - XML

Hello,

I'm trying to use jtidy (r7) to generate XML files
from HTML files.

Currently, I have a problem with some javascript in a HTML file :
the XML output file is not wellformed (some attributes are not quoted).

1) I don't want to use the javascript parts, so I wonder if it was possible
to drop them (by an option like Tidy.setDropScript(true)) ?

2) Is it a bug or is it normal ? Details :

a) the html source example is here :
<HTML>
  <BODY topmargin="0" marginheight="0" leftmargin="0" marginwidth="0">
      <TABLE border="0" cellpadding="0" cellspacing="0" width="100%">
        <TR>
          <TD>
            <A href="truc.html"><IMG src="truc.gif"></A>
          </TD>
        </TR>
      </TABLE>
<SCRIPT type="text/javascript" language='JavaScript'>
document.write('<IFRAME WIDTH=1 HEIGHT=1 MARGINWIDTH=0 MARGINHEIGHT=0
HSPACE=0 VSPACE=0 ');
document.write('FRAMEBORDER=0 SCROLLING=no BORDERCOLOR="#000000" ');
document.write('SRC="http://www.mydomainbidon.com">');
document.write('<SCR'+'IPT WIDTH=1 HEIGHT=1 LANGUAGE="JavaScript1.1">');
document.write('BLA BLA BLA');
document.write('<\/SCR'+'IPT>');
document.write("<\/IFRAME>");
</SCRIPT>
</BODY>
</HTML>

b) the options I currently use are:
tidy.setXmlOut(true);

        tidy.setIndentContent(true);
        tidy.setTidyMark(false);
        tidy.setXmlPi(true);
        tidy.setUpperCaseTags(true);
        tidy.setLiteralAttribs(false);
        tidy.setDropFontTags(true);
        tidy.setDropEmptyParas(true);
        // tidy.setShowWarnings(false);
        tidy.setCharEncoding(Configuration.LATIN1);
        tidy.setRawOut(true);
        tidy.setQuoteNbsp(false);
        tidy.setQuoteAmpersand(false);
        tidy.setWrapScriptlets(true);
        tidy.setWord2000(true);


Find as joined files (I don't know if the list accept them ?)
the html source file (a) 'test.html' ,
and the generated files :
test7.xml (not well formed : my problem : attributes are note quoted),
and test7.txt (the err file).


Thanks.

Thierry G.

Received on Thursday, 20 September 2001 13:06:26 UTC