Re: Tidy problems from Dave Raggett on 2000-06-02 (html-tidy@w3.org from April to June 2000)

From: Dave Raggett <dsr@w3.org>
Date: Fri, 2 Jun 2000 17:31:26 +0100 (GMT Daylight Time)
To: Daniel Persson <danpe271@student.liu.se>
cc: html-tidy@w3.org
Message-ID: <Pine.WNT.4.10.10006021722350.-522327@hazel.hpl.hp.com>

On Mon, 8 May 2000, Daniel Persson wrote:

> Hello,
> 
> Great program!
> 
> I have encountered a few problems in Tidy. I haven't seen them reported before but I didn't check all archived messages either.
> I use Tidy from an other program, running the parser several times on different HTML files without quiting Tidy, it causes some meory problems:
> 
> 1) In parser.c, CoerceNode(...):
>    The following line: MemFree(tmp->element);
>    Should be inserted before the line: MemFree(tmp);
>    Causes memory leaks for me as it is now sometimes.

Thanks.

> 
> 2) In pprint.cpp FreePrintBuf(...);
>    The following lines should be added:
>     linebuf = NULL;
>     lbufsize = 0;
>    Causes memory problems when filtering many files at once if those   
>    lines are not included.

Thanks

> 
> Some other problems:
> 
> 3) A line like (which of course is not correct HTML, but anyway):
>   <b><font>bold</b><br>plain<br></font>
>   Gives the result:
>   <b><font>bold</font><br>plain<br></b>
>   Instead of interpreting it as Netscape, something like:
>   <font><b>bold</b><br>plain<br></font>

When I tested <b><font color=red>bold</b><br>plain<br></font>

I got "bold" in red, and "plain" in black with both "bold" and
"plain" in bold, so I think Tidy is correctly reproducing Netscape
4's behavior.

> 4) Unfinished tags, causes the next tag to be interpreted as text,    
>    instead of as in netscape, correcting the tag. An example:
>    <img src="link"<br>
>    Gives the result:
>    <img src="link">br&gt
>    Instead of, as interpreted by Netscape:
>    <img src=link"><br>

with <img src="link"<br>fred

in Navigator 4, I see a broken image and immediately to its right
"fred". Tidy correctly reports a missing >

> Some functionality that I would like to see in Tidy:
> 
> * An "Ugly print" option. Skipping all linebreaks and blanks,
>   making the resulting file as compact as possible.

Thanks for the suggestion.

> * A way to spcify replacement tags for unsupported tags. To
> transform into a subset of HTML for example. Would be good if it
> could be done during parsing, instead of after as I do now.

What makes this hard is that the parse tree may change during
the course of parsing, so it makes it better to do such replacements
after parsing is done. Tidy already does this when cleaning
presentation markup. You could try XSLT after running Tidy.

Regards,

-- Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
tel/fax: +44 122 578 3011 (or 2521) +44 778 532 0444 (mobile)
World Wide Web Consortium (on assignment from HP Labs)

Received on Friday, 2 June 2000 12:31:34 UTC