W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 1999

Re: Problem with Clean on Word 97 file

From: Terry Teague <teague@mailandnews.com>
Date: Sat, 18 Sep 1999 23:38:33 -0700
Message-Id: <l03130300b40a34221a5c@[17.219.105.209]>
To: html-tidy@w3.org
At 1:53 PM +0300 9/18/99, Jussi Vestman wrote:
>On Fri, 17 Sep 1999, Jim Mundy wrote:
>
>> Tried to run HTML-Tidy with the clean (-c) option on the attached page
>> (test.htm), which was saved from Word 97 as an HTML document.  Also
>> attached is the first part of the TidyOut.log file generated (enough
>> to show the problem).  It's obvious that Tidy got stuck in a loop
>> somewhere on line 17 of the input HTML file.  When I originally ran
>> Tidy (from HomeSite 4.5) it eventually locked up my machine after the
>> TidyOut.log file grew to over 500MB.
>> I think Tidy is great, and was looking forward to using it to clean up
>> Word's poor excuse for HTML rather than doing it by hand.  This test,
>> however, gives me pause.  I don't know whether this problem has been
>>  reported previously (I didn't see it in a brief perusal of the Release
>> Notes), but I hope it can be fixed or worked around.
>
>I have reduced your bug to following test case:
>
><HTML>
><HEAD>
><TITLE>Some title</TITLE>
></HEAD>
><BODY>
><ul>
></B>
></ul>
></BODY>
></HTML>
>
>I found out that ParseList function in parser.c goes to infinite loop,
>when it encounters </B> tag. I don't know where exactly the bug is, but I
>will continue my efforts.

While I don't doubt both your findings, as an FYI, I ran the Word97 file
through the Mac OS version of Tidy (based on the 26 Jul 99 sources, with
the fixes suggested by Dave on 15 Aug 99), and it worked without problem (I
didn't check to see if the correct results were produced - I can provide
the output files if you are interested). Some of the bugs reported in the
26 Jul 99 version of Tidy (and some of the fixes on 15 Aug 99) pertained to
infinite loops. You might want to check older mail to see if this is a
known bug.

Regards, Terry
Received on Sunday, 19 September 1999 02:41:30 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:42 GMT