Re: Problem with Clean on Word 97 file

On Fri, 17 Sep 1999, Jim Mundy wrote:

> Tried to run HTML-Tidy with the clean (-c) option on the attached page
> (test.htm), which was saved from Word 97 as an HTML document.  Also
> attached is the first part of the TidyOut.log file generated (enough
> to show the problem).  It's obvious that Tidy got stuck in a loop
> somewhere on line 17 of the input HTML file.  When I originally ran
> Tidy (from HomeSite 4.5) it eventually locked up my machine after the
> TidyOut.log file grew to over 500MB.
> I think Tidy is great, and was looking forward to using it to clean up
> Word's poor excuse for HTML rather than doing it by hand.  This test,
> however, gives me pause.  I don't know whether this problem has been
>  reported previously (I didn't see it in a brief perusal of the Release
> Notes), but I hope it can be fixed or worked around.
 
I have reduced your bug to following test case:

<HTML>
<HEAD>
<TITLE>Some title</TITLE>
</HEAD>
<BODY>
<ul>
</B>
</ul>
</BODY>
</HTML>

I found out that ParseList function in parser.c goes to infinite loop,
when it encounters </B> tag. I don't know where exactly the bug is, but I
will continue my efforts.

---
Jussi Vestman 
IT student at Lappeenranta University of Technology, Finland
jussi.vestman@lut.fi
http://www.lut.fi/~vestman/

Received on Saturday, 18 September 1999 06:53:54 UTC