- From: Klaus Johannes Rusch <KlausRusch@atmedia.net>
- Date: Wed, 29 Aug 2001 19:39:01 CET
- To: <html-tidy@w3.org>
In <003501c1305a$da510fe0$6703a8c0@nb100>, "Matt G" <mattg@vguild.com> writes: > Yes, but XML isn't XHTML. Understand? > > The following is not valid XHTML. It *is* valid XML. > > <input><form /><foobar /><tr /></input> > > I need to turn really bad HTML into parse-able XML at any cost; that the > result may be complete gibberish with respect to the XHTML DTD's is of no > concern. Try the HTML::TreeBuilder Perl module, this will read an HTML page, build a tree representation and output HTML (as_HTML method) or XML (as_XML, experimental according to the documentation). use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new; do { local $/ = undef; $tree->parse(<>); }; $tree->eof; print $tree->as_XML, "\n"; The output should be parseable XML. -- Klaus Johannes Rusch KlausRusch@atmedia.net http://www.atmedia.net/KlausRusch/
Received on Wednesday, 29 August 2001 14:38:11 UTC