W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 2003

Can JTidy really clean up html?

From: R Villafuerte <bong1@xtra.co.nz>
Date: Mon, 22 Sep 2003 08:39:38 +1200
To: <html-tidy@w3.org>
Message-ID: <000501c38080$7a1d0fc0$0100a8c0@dmonx>
Hi All,
 
I have just recently obtained a copy of JTidy, JTidy is a java version
of html tidy it can be used as an API for parsing xml documents.  My
main objective is really to use it to clean up html pages in order to
transform them into a well formed XML document.  The only problem is
that every time I try to do this on a slightly complex web page, I
always get these types of error messages.  I have not given up hope yet,
I think I have just not configured JTidy correctly that is why it is
doing this or is it really because it cannot handle all web pages, just
basic structures.  If anyone out there is interested in answering, I
repeat I have no desire of wasting your time to tutor me on how to use
this API I just want to know if what is on my mind is correct or should
I find a different solution to my problem.
 
Cheers,
Rico
 
line 12 column 1 - Warning: unexpected </head> in <meta>
line 12 column 239 - Warning: unexpected </td> in <img>
line 12 column 312 - Warning: unexpected </td> in <img>
line 12 column 385 - Warning: unexpected </td> in <img>
line 12 column 390 - Warning: unexpected </tr> in <img>
line 12 column 511 - Warning: unexpected </td> in <img>
line 12 column 610 - Warning: unexpected </tr> in <img>
line 12 column 961 - Warning: unescaped & or unknown entity "&tab"
line 12 column 968 - Warning: unescaped & or unknown entity "&ie"
line 12 column 1,173 - Warning: unescaped & or unknown entity "&tab"
line 12 column 1,180 - Warning: unescaped & or unknown entity "&ie"
line 12 column 1,385 - Warning: unescaped & or unknown entity "&tab"
line 12 column 1,392 - Warning: unescaped & or unknown entity "&ie"
line 12 column 1,600 - Warning: unescaped & or unknown entity "&tab"
line 12 column 1,607 - Warning: unescaped & or unknown entity "&ie"
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.
 
 
Received on Sunday, 21 September 2003 16:40:34 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:54 UTC