W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2001

Trimming spaces and dropping empty paragraphs.

From: Lee Passey <lee@www.dysfunctionals.org>
Date: Wed, 17 Oct 2001 14:53:51 -0600
Message-ID: <3BCDEFDE.EE9729F6@dysfunctionals.org>
To: html-tidy <html-tidy@w3.org>
OK, I'm still having some heartburn over this whole space-trimming and
empty paragraph thing.  I have two problems.  One is that I would expect
paragraphs with only blanks to be dropped or converted just like empty
paragraphs.  Two is that I would expect non-breaking spaces to be
treated just like normal spaces in this instance.

So, are my expections a.) reasonable and b.) consistent with the HTML
spec?

Consider the following five line example:

<p></p>
<p> </p>
<p><b></b></p>
<p><b> </b></p>
<p>&nbsp;<p>

If run through the current (CVS) version of Tidy, they become:

<p></p>
<p><b></b></p>
<p>&nbsp;<p>

In other words, the first and third lines get trimmed into oblivion (or
replaced by <br /><br /> if DropEmptyParagraphs is set to 'no'), but the
second and fourth lines are simply converted from blank paragraphs to
empty paragraphs.  As you might expect, if this output is run through
Tidy again with the same options, only the paragraph with the
non-breaking space remains.

Now it seems to me that once text has been put through Tidy, it ought to
be possible to "re-Tidy" it without obtaining any further changes; any
other behavior is a shortcoming if not an actual bug (I personally view
a bug as "doesn't work like it's supposed to" versus "doesn't handle all
input like I would like."  It may not make me happy, but it's not a
bug.)

In the case of the non-breaking space, I can see no value in treating
<p>&nbsp;</p> any differently than <p> </p>.  What am I missing here?

I have implemented some changes that fix these perceived shortcomings.
Should I post them?

TIA for your guidance.
Received on Wednesday, 17 October 2001 17:02:27 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:46 GMT