Re: Ignoring empty paragraphs

On Sun, 9 Apr 2000, Braden N. McDaniel wrote:
> On Sun, 9 Apr 2000, Jan Roland Eriksson wrote:
> > On Tue, 4 Apr 2000 03:09:26 -0400 (EDT), "Braden N. McDaniel"
> > <> wrote:
> > 
> > >On Tue, 4 Apr 2000, Jan Roland Eriksson wrote:
> > >> On Mon, 3 Apr 2000 19:53:18 -0400 (EDT), "L. David Baron"
> > >> <> wrote:
> > >> > 1) An empty P element should be ignored at the parsing stage, and
> > >> >    therefore should not appear in the DOM and should not be affected
> > >> >    by style sheets.
> > >> 
> > >> This is the correct interpretation.
> > 
> > [...]
> > 
> > >> If there's nothing to mark-up, there's no motivation for markup either.
> > 
> > >Indeed, but it is *not* the parser's job to fix errant document structure! 
> > >It is the parser's job to read the markup that's there. And as long as
> > >it's valid, the DOM tree should have a direct correspondence to the
> > >plaintext representation.
> > 
> > Fair enough. But...
> > 
> > What about "styling" of non existing content?
> > Leave that no-content element dangling in the DOM tree and we need to
> > move the decision not to style it to the CSS renderer instead.
> > 
> > If not, we will not have a way to discourage the use of successive P's
> > for vertical spacing, and that is what I think David's question was all
> > about.
> Hm. That's a good point. I think the bottom line here is that the rule in
> the HTML spec is Stupid: if the spec authors wanted to discourage empty P
> elements, they should have made them altogether illegal.
> But I've come around to agree with you on this. The HTML spec appears to
> make it the job of the parser to fix bad markup. The wording is, "User
> agents should ignore empty P elements," not, "User agents should hide
> empty P elements."

I think the wording in the HTML spec should not be trusted -- it is simply
too vague.  The intention, if I remember correctly, was that consecutive
empty <p>'s would, when rendered, collapse to the same vertical spacing of
a single <p>, or to nothing at all.  The problem is that the HTML spec
doesn't say how this could be done, since it is a markup spec, and not a
formatting/parsing specification.

For someone writing DOM code that access a document, it is unacceptable
that the parser/processor can arbitrarily decide to modify the data
structures by removing data from the document it receives.  For example, I
(or some auto-generation tool pumping out valid HTML) could produce a
document containing something like:

<p id="part1"> </p> 
<p id="para2"> </p> 

and then later use script code to appropriately fill the <p>'s. Obviously
the code will fail if the parser/processor has decided to prune these
empty but needed elements from the tree.

Moreover, with XML this would simply be illegal -- an XML parser can
_never_ modify the incoming data, as Tantek pointed out. All it can do is
tell the XML application whether or not white space is significant in
certain contexts. It does not make sense at this point to let HTML
applications do things that XHTML ones cannot.

I think Jason's idea of an :empty pseudo-class is the most appropriate way
of handling the rendering issue. Indeed, you then have much finer control
over the formatting process, and in a way that can apply to other elements
also. For example, you could have a rule such as:

p:empty p:empty  { display: none}
div:empty div:empty {display: none}

to remove consecutive empty paragraphs and consecutive empty divs from the
rendering process.

Regardless, it would seem useful to change the wording of the HTML
specification (Section 9.3.1) to more carefully say what this "really"
means. Something like:

  We discourage authors from using empty P elements. User agents should
  not render empty P elements. However, style sheet instructions should be
  able to control whether or not empty P (or other) elements are included 
  in the rendering process.

might be better. 

Ian Graham ......................... Centre for Academic Technology
i a n   d o t   g r a h a m    a t    u t o r o n t o   d o t   c a
..................... .................

Received on Monday, 10 April 2000 10:13:55 UTC