W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2000

Re: JavaScript and Java HTML Tidy

From: Dave Raggett <dsr@w3.org>
Date: Thu, 20 Apr 2000 19:21:41 +0100 (GMT Daylight Time)
To: jgutierrez@siconet.es
cc: html-tidy@w3.org, gerald@w3.org
Message-ID: <Pine.WNT.4.10.10004201849460.-616105@hazel.hpl.hp.com>
On Thu, 13 Apr 2000, jorge gutierrez wrote:

> We are building an applet that needs to parse some html response
> pages from a server. This pages are dirty so we are plannig to
> use Java Tidy to clean up the code the applet receives (we can't
> modify the application that generates this pages).
> 
> There is lodsa javascript code into the comments in this pages
> (plenty of errors) and Tidy seems to inspect the comments so it
> stops at the parsing and doesn't even generate the DOM tree.
> 
> Is there no any way to disable the parsing into comments/scripts?

The ANSI C version of Tidy treats the contents of comments
and scripts as CDATA and preserves the text as is. Comments are
treated as special nodes in the tree, while script and style are
treated as regular CDATA elements.

I guess I am misunderstanding what is going wrong for you. Perhaps
you could send me some examples to make it clearer. Note that I
only maintain the C version and not the Java version of Tidy.


Regards,

-- Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
tel/fax: +44 122 578 3011 (or 2521) +44 778 532 0444 (mobile)
World Wide Web Consortium (on assignment from HP Labs)
Received on Thursday, 20 April 2000 14:21:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:43 GMT