- From: Thierry GAGNAIRE <thgalist@natapassion.com>
- Date: Thu, 20 Sep 2001 19:03:00 +0200
- To: <html-tidy@w3.org>
- Message-ID: <007801c141f6$59365de0$380b4b0a@5531d28>
Hello, I'm trying to use jtidy (r7) to generate XML files from HTML files. Currently, I have a problem with some javascript in a HTML file : the XML output file is not wellformed (some attributes are not quoted). 1) I don't want to use the javascript parts, so I wonder if it was possible to drop them (by an option like Tidy.setDropScript(true)) ? 2) Is it a bug or is it normal ? Details : a) the html source example is here : <HTML> <BODY topmargin="0" marginheight="0" leftmargin="0" marginwidth="0"> <TABLE border="0" cellpadding="0" cellspacing="0" width="100%"> <TR> <TD> <A href="truc.html"><IMG src="truc.gif"></A> </TD> </TR> </TABLE> <SCRIPT type="text/javascript" language='JavaScript'> document.write('<IFRAME WIDTH=1 HEIGHT=1 MARGINWIDTH=0 MARGINHEIGHT=0 HSPACE=0 VSPACE=0 '); document.write('FRAMEBORDER=0 SCROLLING=no BORDERCOLOR="#000000" '); document.write('SRC="http://www.mydomainbidon.com">'); document.write('<SCR'+'IPT WIDTH=1 HEIGHT=1 LANGUAGE="JavaScript1.1">'); document.write('BLA BLA BLA'); document.write('<\/SCR'+'IPT>'); document.write("<\/IFRAME>"); </SCRIPT> </BODY> </HTML> b) the options I currently use are: tidy.setXmlOut(true); tidy.setIndentContent(true); tidy.setTidyMark(false); tidy.setXmlPi(true); tidy.setUpperCaseTags(true); tidy.setLiteralAttribs(false); tidy.setDropFontTags(true); tidy.setDropEmptyParas(true); // tidy.setShowWarnings(false); tidy.setCharEncoding(Configuration.LATIN1); tidy.setRawOut(true); tidy.setQuoteNbsp(false); tidy.setQuoteAmpersand(false); tidy.setWrapScriptlets(true); tidy.setWord2000(true); Find as joined files (I don't know if the list accept them ?) the html source file (a) 'test.html' , and the generated files : test7.xml (not well formed : my problem : attributes are note quoted), and test7.txt (the err file). Thanks. Thierry G.
Attachments
Received on Thursday, 20 September 2001 13:06:26 UTC