Converting HTML fragments to XML

Here's what I want to do:

I have a block of text which has HTML markup in it.  It is possible that it
is not strictly valid HTML due to non-escaped special characters such as <,
>, &, etc.  I would like to make it well-formed XML.  For example, I have
the following:

Looking for a 1976 Chevy convertible < $2000, with power windows &
AC.<br>Please <a href="mailto:myaddress@mydomain.com">e-mail me</a>.

and would like it converted to:

Looking for a 1976 Chevy convertible &lt; $2000, with power windows &amp;
AC.<br />Please <a href="mailto:myaddress@mydomain.com">e-mail me</a>.

While I realize that Tidy is capable of translating an HTML page into
well-formed XML with the -asxml flag, it also adds all of the other HTML
tags to make it a "complete" HTML page, such as <html>, <head>, <body>,
etc., and I do not want these tags there because I am inserting the fragment
into an XML page after processing.

Question is, is there a simple way, either from the command-line or within a
configuration file, to tell Tidy *not* to insert the extra tags?  Or do I
need to modify the source code to accomplish this?

BTW, I'm using JTidy.


Thanks,

William.

Received on Tuesday, 1 May 2001 15:25:13 UTC