Re: Can JTidy really clean up html?

* R Villafuerte wrote:
>I have just recently obtained a copy of JTidy, JTidy is a java version
>of html tidy it can be used as an API for parsing xml documents.  My
>main objective is really to use it to clean up html pages in order to
>transform them into a well formed XML document.  The only problem is
>that every time I try to do this on a slightly complex web page, I
>always get these types of error messages.

It seems that you specify JTidy's equivalent of the -xml command line
option. You should not specify XML input if you have HTML/XHTML input.
Otherwise I have no explanation for the "unexpected </td> in <img>"
warnings.

>line 12 column 961 - Warning: unescaped & or unknown entity "&tab"
>line 12 column 968 - Warning: unescaped & or unknown entity "&ie"
>line 12 column 1,173 - Warning: unescaped & or unknown entity "&tab"
>line 12 column 1,180 - Warning: unescaped & or unknown entity "&ie"
>line 12 column 1,385 - Warning: unescaped & or unknown entity "&tab"
>line 12 column 1,392 - Warning: unescaped & or unknown entity "&ie"
>line 12 column 1,600 - Warning: unescaped & or unknown entity "&tab"
>line 12 column 1,607 - Warning: unescaped & or unknown entity "&ie"

These are real errors in the document and Tidy fixes these to &amp;tab
etc.

Received on Sunday, 21 September 2003 17:44:07 UTC