Re: Problem with Clean on Word 97 file

At 1:53 PM +0300 9/18/99, Jussi Vestman wrote:
>On Fri, 17 Sep 1999, Jim Mundy wrote:
>
>> Tried to run HTML-Tidy with the clean (-c) option on the attached page
>> (test.htm), which was saved from Word 97 as an HTML document.  Also
>> attached is the first part of the TidyOut.log file generated (enough
>> to show the problem).  It's obvious that Tidy got stuck in a loop
>> somewhere on line 17 of the input HTML file.  When I originally ran
>> Tidy (from HomeSite 4.5) it eventually locked up my machine after the
>> TidyOut.log file grew to over 500MB.
>> I think Tidy is great, and was looking forward to using it to clean up
>> Word's poor excuse for HTML rather than doing it by hand.  This test,
>> however, gives me pause.  I don't know whether this problem has been
>>  reported previously (I didn't see it in a brief perusal of the Release
>> Notes), but I hope it can be fixed or worked around.
>
>I have reduced your bug to following test case:
>
><HTML>
><HEAD>
><TITLE>Some title</TITLE>
></HEAD>
><BODY>
><ul>
></B>
></ul>
></BODY>
></HTML>
>
>I found out that ParseList function in parser.c goes to infinite loop,
>when it encounters </B> tag. I don't know where exactly the bug is, but I
>will continue my efforts.

While I don't doubt both your findings, as an FYI, I ran the Word97 file
through the Mac OS version of Tidy (based on the 26 Jul 99 sources, with
the fixes suggested by Dave on 15 Aug 99), and it worked without problem (I
didn't check to see if the correct results were produced - I can provide
the output files if you are interested). Some of the bugs reported in the
26 Jul 99 version of Tidy (and some of the fixes on 15 Aug 99) pertained to
infinite loops. You might want to check older mail to see if this is a
known bug.

Regards, Terry

Received on Sunday, 19 September 1999 02:41:30 UTC