RE: JTidy: XML Output

Nayan Hajratwala wrote:
> 
> I am using JTidy to take an HTML file and output it as XML.
> 
> It seems to work fine until I try to run the resulting file through
> Sun's JAXP parser.  In HTML documents that contain '&copy' for the
> copyright symbol, JAXP says: 'Reference to undefined entity "©'.
> 
> Is '&copy' not a valid XML entry?  If so, is there a way to ensure that
> JTidy does not output this?
> 
> I also initially had the same problem with ' ', but calling
> Tidy.setQuoteNbsp(false) fixed that.

Try calling Tidy.setNumEntities(true).  That tells Tidy to restrict
itself to the 5 entities guaranteed to be defined in XML (< >
" ' and &).  All the rest are output as numeric escapes,
so © becomes ©.  You could then probably drop the call to
setQuoteNbsp().

--Randy

Received on Monday, 14 August 2000 18:32:34 UTC