Inline elements with %block as content vs. PRE

E. Stephen Mack (estephen@emf.net)
Sun, 24 Aug 1997 04:02:52 -0700


Message-Id: <3.0.3.32.19970824040252.011a6de4@emf.net>
Date: Sun, 24 Aug 1997 04:02:52 -0700
To: www-html@w3.org
From: "E. Stephen Mack" <estephen@emf.net>
Subject: Inline elements with %block as content vs. PRE

This e-mail expands on a problem with the content of the PRE
element.  (I recently sent a letter to www-html-editor about
this problem, and this post has much more information as
well as a request for comments.)

I pointed out on 21-Jul-97 that OBJECT was an inline element
that could contain block.  Furthermore, IFRAME, and BUTTON are
also inline elements that can contain %block (which is defined
as allowing both inline and blocklevel elements).  This isn't
necessarily a problem...

...*But*...this can lead to strange things, such as:

<PRE>
<OBJECT DATA="foo.gif" TYPE="image/gif" HEIGHT=20 WIDTH=40>
<P>Hello!
<P>Goodbye!
</OBJECT>
</PRE>

(Validated under the Cougar draft at Web-Techs.)  How the paragraphs
should be treated in the context of the PRE element is unclear to me.
Navigator 4.02 seems to treat them as a cause for a line break (not
a paragraph break).

A paragraph would not normally be allowed in a PRE element.

Further abuses of the definition of %inline to indirectly
include block-level elements wrapped within an OBJECT,
BUTTON or IFRAME may also be possible.

(Note: APPLET is similar to OBJECT, but only allows %inline as its
content, instead of %block.  Perhaps OBJECT should be defined
similarly?)

The HTML 4.0 draft's "Definition of Block level and Inline elements"
section [1] says:

> Certain HTML elements are said to be "block level" while others are
> "inline" (also known as "text level"). The distinction is founded on
> several notions: 
> Content model 
>      Generally, block level elements may contain inline elements and
> other block level elements. Generally, inline elements may generally
> contain only data and other inline elements. Inherent in this structural
> distinction is the idea that block elements create "larger" structures
> than inline elements. [...]

Now, I can't miss that "generally" is there in that second paragraph
thrice (perhaps a *little* redundantly), so I don't think it's a
mistake that OBJECT, BUTTON and IFRAME are the three inline elements
that are the exception to this rule.

(I should also point out that historically the popular browsers
don't care about this content rule at all; Navigator and IE don't mind
a bit if you do <FONT SIZE="5"><P>Foo<P>Bar</FONT>, but a validator
will point out the P elements can't go there.  "Why" is a frequently
asked question on c.i.w.a.h.  Currently, the error message from
WebTechs is:
  >  nsgmls:<OSFD>0:25:2:E: document type does not allow element "P"
  >  here; missing one of "OBJECT", "IFRAME", "BUTTON" start-tag
which is correct but very confusing.)

And I understand why OBJECT, BUTTON, and IFRAME should be considered
inline elements (for the same reason that IMG is an inline element).
But there needs to be some thinking about this.  For now, we can plug
the particular PRE hole above by widening the definition of what's
excluded from the PRE element.  From the draft HTML 4.0 DTD [2]:

<!-- excludes markup for images and changes in font size -->
<!ENTITY % pre.exclusion "IMG|BIG|SMALL|SUB|SUP|FONT">
<!ELEMENT PRE - - (%inline)* -(%pre.exclusion)>

This definition is the same as that used in HTML 3.2.  It allows
BASEFONT, IFRAME, APPLET, OBJECT, and BDO to be contained in a
PRE element, all of which strike me as being equally problematic.
(Perhaps Q should be excluded as well?  It depends if quote marks
are added to the Q element by a browser or not.)

While some Web authors might want to include form controls inside
a PRE to make them align more easily, BUTTON, INPUT, LABEL, SELECT,
and TEXTAREA will also cause preformatted text to not be aligned 
and are also valid contents of PRE.  

Also, the effect of SPAN used with style sheets for certain parts
of a PRE element will cause misalignment of the preformatted
text:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">
<HTML LANG="EN">
<HEAD>
<TITLE>PRE test</TITLE>
<STYLE TYPE="text/css">
<!--
SPAN { font-size: larger; color: red;}
-->
</STYLE>
</HEAD>
<BODY>

<H1>This is a <SPAN>test</SPAN></H1>

<PRE>
What is the effect of it?
What is the <SPAN>effect</SPAN> of it?
These 3 lines won't align
</PRE>

That was a <SPAN>test</SPAN>.
</BODY>

This is a valid HTML 4.0 document, but the second line in the
PRE element is wider than the other two.  I could do the same
thing with any element, including <STRONG> or other valid
PRE content.  So you're not going to be able to combat the
use of style sheets to change the size of portions of
preformatted text.

At this point, the draft should either exclude all of the elements I
listed earlier that could potentially mis-align PRE contents, or else
just give up and allow any %block element inside a PRE element.  I've
long wondered why images and fonts weren't allowed in a pre element
anyway.  An author has put them there intentionally, and user agents know
how to treat them; the only thing that gets broken is that the ends of
lines in preformatted text are no longer as predictable.

One way or another, HTML 4.0 has to make a decision about pre.exclusion
and also confirm that it's okay for inline elements to contain %block.

[1] http://www.w3.org/TR/WD-html40/intro/sgmltut.html#h-2.3.3.1
[2] http://www.w3.org/TR/WD-html40/sgml/dtd.html#pre.exclusion