Re: to XML, not XHTML

In <003501c1305a$da510fe0$6703a8c0@nb100>, "Matt G" <mattg@vguild.com> writes:
> Yes, but XML isn't XHTML. Understand?
> 
> The following is not valid XHTML. It *is* valid XML.
> 
> <input><form /><foobar /><tr /></input>
> 
> I need to turn really bad HTML into parse-able XML at any cost; that the
> result may be complete gibberish with respect to the XHTML DTD's is of no
> concern.

Try the HTML::TreeBuilder Perl module, this will read an HTML page, build a tree representation
and output HTML (as_HTML method) or XML (as_XML, experimental according to the documentation).


use HTML::TreeBuilder;
my $tree = HTML::TreeBuilder->new;
do {
    local $/ = undef;
    $tree->parse(<>);
};
$tree->eof;
print $tree->as_XML, "\n";

The output should be parseable XML.

-- 
Klaus Johannes Rusch
KlausRusch@atmedia.net
http://www.atmedia.net/KlausRusch/

Received on Wednesday, 29 August 2001 14:38:11 UTC