W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 2003

Re: Can JTidy really clean up html?

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Sun, 21 Sep 2003 23:43:55 +0200
To: "R Villafuerte" <bong1@xtra.co.nz>
Cc: <html-tidy@w3.org>
Message-ID: <3f701b1c.14829884@smtp.bjoern.hoehrmann.de>

* R Villafuerte wrote:
>I have just recently obtained a copy of JTidy, JTidy is a java version
>of html tidy it can be used as an API for parsing xml documents.  My
>main objective is really to use it to clean up html pages in order to
>transform them into a well formed XML document.  The only problem is
>that every time I try to do this on a slightly complex web page, I
>always get these types of error messages.

It seems that you specify JTidy's equivalent of the -xml command line
option. You should not specify XML input if you have HTML/XHTML input.
Otherwise I have no explanation for the "unexpected </td> in <img>"
warnings.

>line 12 column 961 - Warning: unescaped & or unknown entity "&tab"
>line 12 column 968 - Warning: unescaped & or unknown entity "&ie"
>line 12 column 1,173 - Warning: unescaped & or unknown entity "&tab"
>line 12 column 1,180 - Warning: unescaped & or unknown entity "&ie"
>line 12 column 1,385 - Warning: unescaped & or unknown entity "&tab"
>line 12 column 1,392 - Warning: unescaped & or unknown entity "&ie"
>line 12 column 1,600 - Warning: unescaped & or unknown entity "&tab"
>line 12 column 1,607 - Warning: unescaped & or unknown entity "&ie"

These are real errors in the document and Tidy fixes these to &amp;tab
etc.
Received on Sunday, 21 September 2003 17:44:07 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:54 GMT