perl, HTML::Tidy, clean ...

Hi,

I'm trying to clean up apple wiki markup.

I'm using the following options w/ tidy ...
       tidy_mark => 0,
       output_encoding => 'utf8',
       input_encoding => 'utf8',
       drop_proprietary_attributes => 1,
       output_xhtml => 1,
       clean => 1,
       hide_endtags => 1,

For some wiki entries I get
HTML parser error : Unexpected end tag : p
Lorem ipsum.</li> </ul></p>

Imagine the output from the XML::RPC call to look like:
<div>
<p class="MsoNormal">
<ul>
<li>Lorem.</li>
<li>ipsume.</li>
</ul>
</p>
</div>


What option do I need to set to make tidy carry on w/ processing (stripping the
erroneous p's)  rather than erroring out?


Cheers,
Andrej

Received on Monday, 11 June 2012 00:16:42 UTC