Re: [CSS21] display:run-in clarifications

On Tuesday 01 September 2009, Boris Zbarsky wrote:
> Bert Bos wrote:
> >> Since this is all being defined on _elements_, not boxes, the
> >> <object> in fact has a display:block child and so would inhibit
> >> its parent from being run in per the existing rule c, even if the
> >> <object> is display:inline.  That seems wrong to me...
> >
> > If the OBJECT has a child then it is not replaced, and conversely
> > when it is replaced it has no child.
>
> Sorry, that's the case for the _box_ tree, but not the DOM tree.  The
> definition we're working with here is done on the DOM tree (or at
> least that's where all the reference links point to); in the DOM the
> <object> always has children.

Don't confuse CSS's document tree with the DOM. (And the boxes don't 
necessarily form a tree, see below.)

The document tree that CSS works on is a very simple tree that is based 
on SGML's ESIS, but omits lots of things, such as PIs, all information 
about entities, comments, character encoding, and everything related to 
DTDs. Basically, the tree consist of just elements, attributes and text 
strings, augmented with pseudo-classes and replaced elements.

Both the facts that it is simple and that it is different from the ESIS 
are on purpose: we expected already in 1994 that there would someday be 
an "SGML-light" (which became XML); and we also wanted CSS to be 
applicable, if needed, to other tree-structured documents, such as C 
programs, or, indeed, to CSS itself.

Elements are nodes in the tree. They have a name, a parent, and zero or 
more IDs. Unlike SGML & XML, CSS allows there to be more than one ID 
per element. Also, the names aren't restricted to SGML name tokens. 
Elements come in two types: replaced or not. If elements are replaced, 
they may have an intrinsic width, height and/or aspect ratio; if they 
are not, they have zero or more children. Elements also have 
pseudo-classes (':visited', ':active', etc.).

(Pseudo-elements aren't elements and don't fit in the tree, although 
some of their aspects are defined by acting as if, for the purpose of 
those aspects, they modified the tree in some way.)

Text strings are nodes in the tree, but always leaf nodes. They have 
nothing but a string of text, without any information about how that 
string was represented in the source. I.e., SGML entities have been 
expanded, white space and record ends have been normalized, and text 
consists of Unicode characters, without any encoding.

Elements have a set (i.e., unordered) of zero or more attributes. Each 
attribute has a name and a value. Each name only occurs once in the 
set. Each value is a text string. The text is already normalized and 
has no type, even if in SGML/XML they were numbers, IDs, IDREFs, etc. 
(An attribute of type ID in SGML/XML is thus represented twice in the 
CSS document tree: its value is an ID for the element and its name and 
value are also in the set of attributes.)

The XML infoset fortunately indeed proved sufficiently similar to SGML's 
ESIS that no change to CSS's document tree was needed when XML was 
created.

CSS doesn't completely define how a source document is transformed to a 
document tree, but for SGML and XML, this is done in the obvious way 
and then we assume some oracle (in fact: context and format-specific 
information) to provide the pseudo-classes and information about 
replaced elements.

The document tree is not related to the DOM, except in the fact that the 
DOM also resembles the ESIS and infoset, and resembles them 
sufficiently that UAs that implement both CSS and the DOM can often use 
the DOM (after enhancing it with the equivalents of pseudo-classes and 
pseudo-elements) as a superset of the tree that CSS needs.

Replaced element is a concept inspired by Ted Nelson's transclusion, but 
is proper to CSS. We introduced that term on purpose to be different 
from any existing terms, in particular in SGML and HTML. (HTML talks 
about IMG and OBJECT in terms of embedding and including, not 
replacing, and that is exactly right, because CSS's replaced element is 
an abstract concept and does not map 1-to-1 to these elements.)

A replaced element doesn't have children. (If it had, they would 
generate boxes, like all other elements.) In fact, a replaced element 
doesn't have content at all, or at least not any content that CSS knows 
about. The spec says the content is "outside the scope." At most the 
replaced element has an intrinsic width, height, or aspect ratio, and 
that is all CSS can know about it.

>
> > In this case the author used DIV as fallback, not SPAN, which
> > rather suggests that he wants the fallback to be a paragraph or
> > more. That doesn't seem wrong at all. Especially as he also had the
> > option to use 'inline-block'. Clearly, he wants the OBJECT to be
> > inline *only* when it is actually replaced, not when the fallback
> > text is displayed instead.
>
> I'm talking about the case when the <object>'s fallback is not shown.
> When its fallback is shown there's no problem here.
>
> > On the other hand, what to do with :before/:after on replaced
> > elements is trickier. It's not for nothing that the WG decided to
> > postpone the issue. :-)
>
> Agreed.
>
> > We need to define it for CSS3, but I'd rather not hold up CSS 2.1.
> > Maybe somebody finds *the* solution and we all agree immediately,
> > but it rather looks like it will be a long and complex definition
> > with lots of if-then clauses...
>
> The other option, as I said, is defining run-in in terms of the box
> tree, not the element tree.  Then the definition will simply work
> however it should once we sort out what boxes, if any, :before and
> :after on replaced elements should generate.

I don't know if the boxes form a tree... CSS talks, informally, about 
a "formatting structure," which may resemble the document tree, but is 
not necessarily a tree itself. (Not "tree-shaped" is what CSS says, see 
chapter 2.) The exact structure is not important and that is why CSS 
doesn't define it further. The important thing is that each box belongs 
to exactly one element (is "generated by" an element as CSS says), 
because the element is where the properties are.

CSS talks about "parent box" (in four places, I think), but that is just 
for brevity. It should be read as "box generated by the parent of the 
element that generated this box" with the implicit assumption that 
there is only one such box or that it is clear which one is meant 
otherwise.

(If CSS 2.1 sometimes seems to have been written in a particularly 
sloppy and ambiguous way, remember that most of the text dates from 
1997-1998 and was re-written prior to publication by a technical 
writer. Moreover, it has undergone more than ten years of patches by 
many different people, which improved precision locally, but didn't do 
much for the overall consistency. In fact, many of these patches 
weren't meant to improve the spec, they were meant to permit common 
implementation bugs to persist, so that the spec would become a W3C 
Recommendation sooner...)



Bert
-- 
  Bert Bos                                ( W 3 C ) http://www.w3.org/
  http://www.w3.org/people/bos                               W3C/ERCIM
  bert@w3.org                             2004 Rt des Lucioles / BP 93
  +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France

Received on Wednesday, 2 September 2009 10:36:34 UTC