- From: David Rennie Hinshelwood <hinsheld@crl.nmsu.edu>
- Date: Tue, 18 Jul 2000 13:32:24 -0600
- To: "'Html-Tidy" <html-tidy@w3.org>
Hi, I'm using JTidy to parse web pages from any language and character set. But I have run into problems. When run on http://www.number.ne.jp/ I get errors like: line 177 column 167 - Warning: unescaped & or unknown entity "輔" line 177 column 207 - Warning: unescaped & or unknown entity "行" line 177 column 223 - Warning: unescaped & or unknown entity "自" line 178 column 147 - Warning: unescaped & or unknown entity "進" line 178 column 163 - Warning: unescaped & or unknown entity "論" line 178 column 193 - Warning: unescaped & or unknown entity "藤" line 178 column 209 - Warning: unescaped & or unknown entity "/" line 178 column 249 - Warning: unescaped & or unknown entity "/" line 178 column 281 - Warning: unescaped & or unknown entity "靖" These are actual chars in Japanese. How do I set JTidy to ignore all content except HTML/XHTML tags? David Hinshelwood CRL NMSU Tel: (505) 646 3342 (office) (505) 645 5537 (home)
Received on Tuesday, 18 July 2000 15:29:03 UTC