W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2001

Re: Trimming spaces and dropping empty paragraphs.

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Thu, 18 Oct 2001 00:08:46 +0200
To: Lee Passey <lee@www.dysfunctionals.org>
Cc: html-tidy <html-tidy@w3.org>
Message-ID: <hnvrsto5ec55am1p6cl332nr84f6asntes@4ax.com>
* Lee Passey wrote:
>OK, I'm still having some heartburn over this whole space-trimming and
>empty paragraph thing.  I have two problems.  One is that I would expect
>paragraphs with only blanks to be dropped or converted just like empty
>paragraphs.

Containing only _whitespace characters_, yes.

>Two is that I would expect non-breaking spaces to be
>treated just like normal spaces in this instance.

No, whitespace gets stripped on parsing in HTML (in XHTML it is stripped
in the First Edition, somtimes using special script dependend rules
preserved but not rendered in most cases), but &nbsp; is not a
whitespace character in terms of HTML/XML/XHTML.

>Consider the following five line example:
>
><p></p>
><p> </p>
><p><b></b></p>
><p><b> </b></p>
><p>&nbsp;<p>
>
>If run through the current (CVS) version of Tidy, they become:
>
><p></p>
><p><b></b></p>
><p>&nbsp;<p>

And running it again you get

  <p>&nbsp;</p>

This might be considered a bug, Tidy should produce a canonical version
of the document (equal settings => equal result, no matter how often you
apply these rules) and here it doesn't. I vote for fixing it, your
example and the result after cleaning it two times render the same in
current browsers.

>In the case of the non-breaking space, I can see no value in treating
><p>&nbsp;</p> any differently than <p> </p>.  What am I missing here?

The &nbsp; affects rendering, it must be preserved (for this reason and
according to the rules of HTML/XHTML/XML).
-- 
Björn Höhrmann { mailto:bjoern@hoehrmann.de } http://www.bjoernsworld.de
am Badedeich 7 } Telefon: +49(0)4667/981028 { http://bjoern.hoehrmann.de
25899 Dagebüll { PGP Pub. KeyID: 0xA4357E78 } http://www.learn.to/quote/
Received on Wednesday, 17 October 2001 18:09:52 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:46 GMT