- From: Klaus Johannes Rusch <KlausRusch@atmedia.net>
- Date: Wed, 29 Aug 2001 19:39:01 CET
- To: <html-tidy@w3.org>
In <003501c1305a$da510fe0$6703a8c0@nb100>, "Matt G" <mattg@vguild.com> writes:
> Yes, but XML isn't XHTML. Understand?
>
> The following is not valid XHTML. It *is* valid XML.
>
> <input><form /><foobar /><tr /></input>
>
> I need to turn really bad HTML into parse-able XML at any cost; that the
> result may be complete gibberish with respect to the XHTML DTD's is of no
> concern.
Try the HTML::TreeBuilder Perl module, this will read an HTML page, build a tree representation
and output HTML (as_HTML method) or XML (as_XML, experimental according to the documentation).
use HTML::TreeBuilder;
my $tree = HTML::TreeBuilder->new;
do {
local $/ = undef;
$tree->parse(<>);
};
$tree->eof;
print $tree->as_XML, "\n";
The output should be parseable XML.
--
Klaus Johannes Rusch
KlausRusch@atmedia.net
http://www.atmedia.net/KlausRusch/
Received on Wednesday, 29 August 2001 14:38:11 UTC