W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2002

Re: JTidy - Beginner's question

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Thu, 10 Oct 2002 00:53:58 +0200
To: "Christian Peter" <cpeter@rostock.igd.fhg.de>
Cc: html-tidy@w3.org
Message-ID: <3db7b286.7917905@smtp.bjoern.hoehrmann.de>

* Christian Peter wrote:
>And here's the things confusing me:
>First, the generated files start with
>   <html>
>   <head>
>   <meta name="generator" content="HTML Tidy, see www.w3.org" />
>rather than with
>   <?xml version="1.0" encoding="us-ascii"?>
>   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
>Why's that? This looks to me as if the output isn't set to XML at 

Sure it is but JTidy has for some reason omitted the document type
declaration and has not inserted a XML declaration. You can enforce
this using the appropriate configuration options. Does the document
you tried to tidy contain proprietary markup?

>Second, with quite a lot of sites (e.g. www.nasa.gov) I get a 
>parsing error when reading the generated file (with IE or Netscape):
>   XML Parsing Error: undefined entity
>   Location: file:///C:/prog/3DWS/JTidy/files/www.nasa.gov.xml
>   Line Number 208, Column 22:size="2">NASA en
>   Espa&ntilde;ol</font></a></td>
>Question: which settings are necessary to get this handled properly?

Let Tidy output either numeric character references or a document type
declaration pointing at a DTD that defines those entities.

Btw., there is a JTidy forum at


where it is more likely to find people who can help you, this mailing
list is - strictly speaking - only for discussion of the C version
command line tool.
Received on Wednesday, 9 October 2002 18:53:24 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:52 UTC