W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 1999

Re: Problem with Clean on Word 97 file

From: Jussi Vestman <vestman@lut.fi>
Date: Sat, 18 Sep 1999 13:53:34 +0300 (EEST)
To: Jim Mundy <Mundy.Jim@metnet.navy.mil>
cc: html-tidy@w3.org
Message-ID: <Pine.LNX.4.10.9909181340350.23998-100000@kuha.cc.lut.fi>
On Fri, 17 Sep 1999, Jim Mundy wrote:

> Tried to run HTML-Tidy with the clean (-c) option on the attached page
> (test.htm), which was saved from Word 97 as an HTML document.  Also
> attached is the first part of the TidyOut.log file generated (enough
> to show the problem).  It's obvious that Tidy got stuck in a loop
> somewhere on line 17 of the input HTML file.  When I originally ran
> Tidy (from HomeSite 4.5) it eventually locked up my machine after the
> TidyOut.log file grew to over 500MB.
> I think Tidy is great, and was looking forward to using it to clean up
> Word's poor excuse for HTML rather than doing it by hand.  This test,
> however, gives me pause.  I don't know whether this problem has been
>  reported previously (I didn't see it in a brief perusal of the Release
> Notes), but I hope it can be fixed or worked around.
 
I have reduced your bug to following test case:

<HTML>
<HEAD>
<TITLE>Some title</TITLE>
</HEAD>
<BODY>
<ul>
</B>
</ul>
</BODY>
</HTML>

I found out that ParseList function in parser.c goes to infinite loop,
when it encounters </B> tag. I don't know where exactly the bug is, but I
will continue my efforts.

---
Jussi Vestman 
IT student at Lappeenranta University of Technology, Finland
jussi.vestman@lut.fi
http://www.lut.fi/~vestman/
Received on Saturday, 18 September 1999 06:53:54 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:42 GMT