W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2001

Re: Converting HTML fragments to XML

From: Klaus Johannes Rusch <KlausRusch@atmedia.net>
Date: Sun, 6 May 2001 13:42:30 CET
Message-Id: <200105061155.HAA27849@tux.w3.org>
To: "Tidy Mailing List" <html-tidy@w3c.org>
In <F991D4265D6AD4119A1900508BC98E572FD60C@NTEXCL01>, William Bagby <williamb@adone.com> writes:
> I have a block of text which has HTML markup in it.  It is possible that it
> is not strictly valid HTML due to non-escaped special characters such as <,
> >, &, etc.  I would like to make it well-formed XML.
> ...
> 
> Question is, is there a simple way, either from the command-line or within a
> configuration file, to tell Tidy *not* to insert the extra tags?  Or do I
> need to modify the source code to accomplish this?

The easiest way probably is to run the markup through Tidy, then strip 
everything up to the <body> tag, and everything from the </body> tag.

Note this will still give you a <p> tag, depending on your fragments you may be
able to simply discard it, or place some "marker" tag to denote the start of 
your content.

-- 
Klaus Johannes Rusch
KlausRusch@atmedia.net
http://www.atmedia.net/KlausRusch/
Received on Sunday, 6 May 2001 07:55:27 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:45 GMT