- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Fri, 12 Apr 2002 09:51:54 +0200
- To: Lee Passey <lee@dysfunctionals.org>
- Cc: tidy-develop@lists.sourceforge.net, html-tidy <html-tidy@w3.org>
* Lee Passey wrote: >Bjoern Hoehrmann wrote: > >> This might be considered a bug, Tidy should produce a canonical version >> of the document (equal settings => equal result, no matter how often you >> apply these rules) and here it doesn't. I vote for fixing it, your >> example and the result after cleaning it two times render the same in >> current browsers. > >After pursuing several false paths, I think I have come up with a very small >change which will solve most, if not all, of these problems. The apparent >theory of operation is that if spaces in a text node are trimmed to the point >where the node no longer contains any text, the node should be removed from >the tree. This removal occurs in the parser.c in the function >TrimTrailingSpace(). However, the test of whether to remove the node is >inside the conditional (last->end > last->start), so if the node enters the >function already empty it will not be removed. This situation can occur, for >example, when you have an empty space, bracketed by inline tags such as <em>, >inside a block, such as paragraphs, e.g: > ><p><em> </em></p> > >In this case TrimInitialSpace() has incremented node->start before >TrimTrailingSpace is called, so the node is now empty, but has not been >removed. When the resulting text is printed it appears as: > ><p><em></em></p> > >the space has been removed, but the tags are intact. Running this through >tidy a second time causes the (now) empty paragraph to be removed. > >The simple fix to this is to split the conditional statement into a test for >a text node and a test for content, and then placing the test for removing >the node inside the first block but outside the second. Here are the diffs >to implement the fix: Has this been implemented? Tidy doesn't show this behaivour any longer. However, while trying to fix this bug, I encountered the fact, that in parser.c TrimSpaces() calls are typically followed by a TrimEmptyElement() call, but there are cases where this does not happen. This seems to be a bug, it would be a good idea to implement this TrimEmptyElement() call in TrimSpaces() and delete all separate calls. Comments? Regards.
Received on Friday, 12 April 2002 03:52:46 UTC