W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 2000

using JTidy with all character sets

From: David Rennie Hinshelwood <hinsheld@crl.nmsu.edu>
Date: Tue, 18 Jul 2000 13:32:24 -0600
To: "'Html-Tidy" <html-tidy@w3.org>
Message-ID: <NEBBJDDLAMECEDNEDDLKAEAICBAA.hinsheld@crl.nmsu.edu>
Hi,
I'm using JTidy to parse web pages from any language and character set. But
I have run into problems. When run on http://www.number.ne.jp/ I get errors
like:
line 177 column 167 - Warning: unescaped & or unknown entity "&#36628"
line 177 column 207 - Warning: unescaped & or unknown entity "&#34892"
line 177 column 223 - Warning: unescaped & or unknown entity "&#33258"
line 178 column 147 - Warning: unescaped & or unknown entity "&#36914"
line 178 column 163 - Warning: unescaped & or unknown entity "&#35542"
line 178 column 193 - Warning: unescaped & or unknown entity "&#34276"
line 178 column 209 - Warning: unescaped & or unknown entity "&#65295"
line 178 column 249 - Warning: unescaped & or unknown entity "&#65295"
line 178 column 281 - Warning: unescaped & or unknown entity "&#38742"

These are actual chars in Japanese. How do I set JTidy to ignore all content
except HTML/XHTML tags?

David Hinshelwood
CRL NMSU
Tel: (505) 646 3342 (office)
       (505) 645 5537 (home)
Received on Tuesday, 18 July 2000 15:29:03 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:44 GMT