W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 2000

RE: JTidy: XML Output

From: Randy Waki <rwaki@flipdog.com>
Date: Mon, 14 Aug 2000 16:27:58 -0600
To: <html-tidy@w3.org>
Message-ID: <000101c0063e$e621d5d0$51eee13f@rwaki>
Nayan Hajratwala wrote:
> I am using JTidy to take an HTML file and output it as XML.
> It seems to work fine until I try to run the resulting file through
> Sun's JAXP parser.  In HTML documents that contain '&copy' for the
> copyright symbol, JAXP says: 'Reference to undefined entity "&copy;'.
> Is '&copy' not a valid XML entry?  If so, is there a way to ensure that
> JTidy does not output this?
> I also initially had the same problem with '&nbsp;', but calling
> Tidy.setQuoteNbsp(false) fixed that.

Try calling Tidy.setNumEntities(true).  That tells Tidy to restrict
itself to the 5 entities guaranteed to be defined in XML (&lt; &gt;
&quot; &apos; and &amp;).  All the rest are output as numeric escapes,
so &copy; becomes &#169;.  You could then probably drop the call to

Received on Monday, 14 August 2000 18:32:34 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:48 UTC