W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2002

Re: Trimming spaces and dropping empty paragraphs.

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Fri, 12 Apr 2002 09:51:54 +0200
To: Lee Passey <lee@dysfunctionals.org>
Cc: tidy-develop@lists.sourceforge.net, html-tidy <html-tidy@w3.org>
Message-ID: <op6bbuo6cfumob260pv8r7rgd8o2rajfru@4ax.com>
* Lee Passey wrote:
>Bjoern Hoehrmann wrote:
>
>> This might be considered a bug, Tidy should produce a canonical version
>> of the document (equal settings => equal result, no matter how often you
>> apply these rules) and here it doesn't. I vote for fixing it, your
>> example and the result after cleaning it two times render the same in
>> current browsers.
>
>After pursuing several false paths, I think I have come up with a very small
>change which will solve most, if not all, of these problems.  The apparent
>theory of operation is that if spaces in a text node are trimmed to the point
>where the node no longer contains any text, the node should be removed from
>the tree.  This removal occurs in the parser.c in the function
>TrimTrailingSpace().  However, the test of whether to remove the node is
>inside the conditional (last->end > last->start), so if the node enters the
>function already empty it will not be removed.  This situation can occur, for
>example, when you have an empty space, bracketed by inline tags such as <em>,
>inside a block, such as paragraphs, e.g:
>
><p><em> </em></p>
>
>In this case TrimInitialSpace() has incremented node->start before
>TrimTrailingSpace is called, so the node is now empty, but has not been
>removed.  When the resulting text is printed it appears as:
>
><p><em></em></p>
>
>the space has been removed, but the tags are intact.  Running this through
>tidy a second time causes the (now) empty paragraph to be removed.
>
>The simple fix to this is to split the conditional statement into a test for
>a text node and a test for content, and then placing the test for removing
>the node inside the first block but outside the second.  Here are the diffs
>to implement the fix:

Has this been implemented? Tidy doesn't show this behaivour any longer.
However, while trying to fix this bug, I encountered the fact, that in
parser.c TrimSpaces() calls are typically followed by a
TrimEmptyElement() call, but there are cases where this does not happen.
This seems to be a bug, it would be a good idea to implement this
TrimEmptyElement() call in TrimSpaces() and delete all separate calls.

Comments?

Regards.
Received on Friday, 12 April 2002 03:52:46 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:52 GMT