W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2001

Converting HTML fragments to XML

From: William Bagby <williamb@adone.com>
Date: Tue, 1 May 2001 15:18:34 -0400
Message-ID: <F991D4265D6AD4119A1900508BC98E572FD60C@NTEXCL01>
To: "Tidy Mailing List (E-mail)" <html-tidy@w3c.org>
Here's what I want to do:

I have a block of text which has HTML markup in it.  It is possible that it
is not strictly valid HTML due to non-escaped special characters such as <,
>, &, etc.  I would like to make it well-formed XML.  For example, I have
the following:

Looking for a 1976 Chevy convertible < $2000, with power windows &
AC.<br>Please <a href="mailto:myaddress@mydomain.com">e-mail me</a>.

and would like it converted to:

Looking for a 1976 Chevy convertible &lt; $2000, with power windows &amp;
AC.<br />Please <a href="mailto:myaddress@mydomain.com">e-mail me</a>.

While I realize that Tidy is capable of translating an HTML page into
well-formed XML with the -asxml flag, it also adds all of the other HTML
tags to make it a "complete" HTML page, such as <html>, <head>, <body>,
etc., and I do not want these tags there because I am inserting the fragment
into an XML page after processing.

Question is, is there a simple way, either from the command-line or within a
configuration file, to tell Tidy *not* to insert the extra tags?  Or do I
need to modify the source code to accomplish this?

BTW, I'm using JTidy.


Thanks,

William.
Received on Tuesday, 1 May 2001 15:25:13 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:45 GMT