- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Thu, 10 Oct 2002 00:53:58 +0200
- To: "Christian Peter" <cpeter@rostock.igd.fhg.de>
- Cc: html-tidy@w3.org
* Christian Peter wrote: >And here's the things confusing me: > >First, the generated files start with > > <html> > <head> > <meta name="generator" content="HTML Tidy, see www.w3.org" /> > >rather than with > > <?xml version="1.0" encoding="us-ascii"?> > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" > >Why's that? This looks to me as if the output isn't set to XML at >all. Sure it is but JTidy has for some reason omitted the document type declaration and has not inserted a XML declaration. You can enforce this using the appropriate configuration options. Does the document you tried to tidy contain proprietary markup? >Second, with quite a lot of sites (e.g. www.nasa.gov) I get a >parsing error when reading the generated file (with IE or Netscape): > > XML Parsing Error: undefined entity > Location: file:///C:/prog/3DWS/JTidy/files/www.nasa.gov.xml > Line Number 208, Column 22:size="2">NASA en > Español</font></a></td> > >Question: which settings are necessary to get this handled properly? Let Tidy output either numeric character references or a document type declaration pointing at a DTD that defines those entities. Btw., there is a JTidy forum at http://sourceforge.net/forum/forum.php?forum_id=41436 where it is more likely to find people who can help you, this mailing list is - strictly speaking - only for discussion of the C version command line tool.
Received on Wednesday, 9 October 2002 18:53:24 UTC