Processing model for line formation and the CSS Line Modu8le from Stephen Zilles on 2012-02-05 (www-style@w3.org from February 2012)

From: Stephen Zilles <szilles@adobe.com>
Date: Sun, 5 Feb 2012 01:45:30 -0800
To: "www-style@w3.org list (www-style@w3.org)" <www-style@w3.org>
Message-ID: <CE2F61DA5FA23945A4EA99A212B157954AE1CF099E@nambx03.corp.adobe.com>

All,
John Daggett and I are the (new) editors of the CSS Line Module which has been in suspension for some time. This module is concerned with the alignment of text (and replaced objects) on a line and, as of recently, with the Line Grid. To begin the discussion on this module, identifying the Processing Model is the first step. This message discusses that Model with the intent of soliciting comments and allowing a discussion at the upcoming CSS F2F in Paris.

The Text Module talks about line-breaking and justification so the main role of the Line Module is "alignment": alignment of the components of a line within the line and the alignment of lines to other lines or to the line grid. Here I am assuming that the line grid consists of equally spaced "line alignment tables" where a line alignment table is a scaled and positioned copy of a baseline table for the font, dominant baseline and size in effect when the line grid is established. (Since the font-family property takes a list of font names, which font's baseline table is used in creating the line alignment table needs to be specified. Likely, this is the baseline table of the first available font as used in CSS 2.1 to align elements with no content.)

My understanding of the processing model (for horizontal text) is the following:

1. Text processing (i.e., line breaking and justification} is done first per the Text Module. This determines the content of a line and its horizontal extent.

2. For each line constructed in step 1., the components of the line (be they individual glyphs, composite glyphs (e.g., ligatures, graphemes), replaced content objects, or spans) are aligned, vertically, with respect to its neighbors. In actuality, this is a binary process of aligning some alignment point in the current component to a baseline (often the dominant baseline) in its parent. (The (paragraph) block element is the ultimate parent for this alignment. That is, every line is within some (paragraph) block (said block may consist solely of the line itself)).(More details on vertical alignment below.)

3. Having aligned all the components, it is possible to compute the vertical extent of the line. This includes adding half-leading, where appropriate, to each component. This vertical extent determines the top and bottom of the line box (and Text Processing determined its horizontal extent)

4. If alignment to a line grid is not specified, then the top of the current line box is aligned to the bottom of the previous line box if there is one and, otherwise, at the top of the block box in which the line box occurs.

5. If alignment to a named line grid is specified, then a relevant baseline in the current line box is aligned with the relevant baseline in the next line alignment table in the line grid. (Which baseline is the "relevant baseline" needs to be determined and I am not sure it is always the dominant baseline of the (paragraph) block.) In the simple case when there is only one baseline in the used in the line, then that baseline is aligned to the same baseline in the line alignment table in the line grid. What to do if this alignment would cause a collision with the previous line is discussed below.

Above is the basic outline of the steps in line building. For vertical text the roles of horizontal and vertical above need to be interchanged. And, each of these steps is more complicated than indicated by the basic outline due to edge cases and the historical development of CSS.

More details on vertical alignment.
============================

Most of the complexity associated with vertical alignment is due (1) to the need to align objects (e.g. graphics or images) that do not have font metrics and (2) to the possibility of aligning elements to the "top" and/or "bottom" of the line box. (And, it is alignment that determines, provisionally, the "top" and "bottom" of the line box so how this works needs to be carefully specified.

For glyphs, each glyph has an alignment point, typically the position of one of the baselines for the font from which the glyph comes. This alignment point is normally aligned to the same baseline in the baseline table active for the element holding the character (or characters) which generated the glyph. (The active baseline table is a version of the baseline table of the font active for that element scaled to the font-size of that element. Each font is presumed to have a baseline table which specifies the vertical coordinates for a dominant baseline and all the other baselines of which font is aware.) This process is straightforward.

When non-glyph objects, such as images and graphics appear in a line, they do not have an intrinsic alignment point. Their default alignment point is at the bottom margin edge of the object (per CSS 2.1 'vertical-align'), except for 'middle' alignment which uses the middle of the margin box (presumedly). To allow non-glyph (and glyph) objects to be given another alignment point, the 'alignment-adjust' property is introduced. This property allows explicit specification of the alignment point. This removes the problem of positioning a non-glyph object.

The 'vertical-align' property does specify that percentage values shift the alignment-point. The main problem with the 'vertical-align' property is that it tries to do too many things with a single property. When you consider multiple baselines, in multiple sizes and with varying alignment points, you need more control than a single property can provide. That is why earlier drafts of the CSS Line Module split 'vertical-align' into 4 properties: 'dominant-baseline' which selects which baseline table to use (because there can be one for each baseline the font understands) and sizes that table using the current 'font-size'; 'alignment-baseline' which choose which baseline in the current sized baseline table the alignment-point (of the object being aligned) is positioned at; 'alignment-adjust' which (like percentage values on 'vertical-align') allows the alignment-point to be adjusted; and, 'baseline-shift' which provides a temporary shift of the normal baseline-table to handle sub- and super-scripts. I had not wanted to go into this level of detail at this point, but it may be necessary to be sufficiently clear. The idea is that 'vertical-align' then becomes a shorthand for the above 4 properties

The problem of "top" and "bottom" alignment is a bit more complex. These "baselines" do not exist in the font baseline tables because they are dependent on the content of the line and on the "leading" (more accurately the "half-leading") that is added to each glyph. The position of "top" and "bottom" have to be computed. Loosely, this involves the height of the line ignoring the "top" and "bottom" aligned elements, then position the "top" aligned elements at the top of the line and computing a new "bottom" of the line box ignoring the "bottom" aligned elements. Finally, the "bottom" aligned elements are position either at the new line box bottom if they will fit in the line box, or if they will not, the bottom of the line box is extended until all bottom aligned elements will not go above the top of the line box. (This can be said more precisely with more text.) As noted in CSS 2.1, this calculation may leave the position of the dominant baseline in the line at an indeterminate position, making alignment to a line grid difficult.

There are additional details associated with changing font sizes within a line, changing baselines, and shifting for sub- or super-scripts, but these details do not affect the basic structure of the processing model, so are no discussed here.

More details on line grid alignment
============================
For the moment, assume that a line grid has only one baseline (the dominant baseline) per line (more on this below). It is necessary to specify the spacing between instances of this baseline in the line grid when it is defined. The first thought is to make the line grid lines be separated by 1EM in the font current at the line grid's definition. It seems likely, however, that the author may want to add some leading to the separation of the lines in the line grid to account for larger lines (say ones with ruby annotations or sub/superscripts) that he expects to occur. Is font size and leading (or some other surrogate for it) all that needs to be specified in defining the line grid?

As noted with vertical alignment, there is a baseline table, not just a single baseline for each "row" of the line grid. That is, each "line" in the "line grid" is really a line alignment table. The positions, in the line alignment table, of the font based baselines (those for which there is an entry in the baseline table of the font current at the definition of the line grid) is straightforward. It is based on the font size, dominant baseline and the baseline table. The full alignment table is need, for example, when positioning text in different scripts, say English and Hindi, with different alignment points within the table. The position of the line based baseline, "top" (of the line) and "bottom" (of the line) become problematic because the line grid does not yet have lines. One solution to this is to say that these correspond to "text top" and "text bottom" which are defined based on font information.

Now consider some edge cases with line grid alignment. If that alignment described in 5. above would cause the current line box to overlap the previous line box (or go outside the (paragraph) block box), then more than one line's worth of the line grid is needed for the current line box. One example where this occurs is in the handling of sub-headings in the text that are scaled to a larger font size, possibly with greater leading as well. This means that the heading cannot be aligned with the next line, but will need more than one lines space. There are several options when more than one line's worth of the grid is needed. For simplicity, these cases will be discussed assuming on one baseline, the dominant baseline, occurs in both the line grid and the line, other cases are similar. Solution possibilities are:

1. The oversize line has its baseline aligned with the next line grid position that allows the height of the line above that base line to fit below the bottom of the previously aligned line. Since the height of the oversize line below the baseline may not allow the next (regular size) line to be positioned at the following line grid position, it may be necessary to leave extra space following the oversize line to position the next line on the line grid.

2. It must be possible to specify that an element, such as the oversize sub-heading, is not aligned to the grid at all, in which case it is only necessary to allow as many line grid positions as needed to hold the oversize line and still allow the prior and following (regular size) lines to be aligned to the line grid. But, this still leaves the question of where in the allowed space the non-aligned element is to be placed. At the Kyoto face-to-face, Nat McCully indicated that for Japanese "gyodori" meant to "center" the oversized line in the allowed space. For latin text, it would seem more correct to bottom align the oversize line in the allowed space, but top alignment might also make sense in some instances. This suggests having a property (or values of the alignment property) which would control aligning an oversize line in the allowed space or (as in 1. Above) aligning it to the line grid. For centering, there is also the question of specifying how the "center" of the line is determined; that is, is it baseline based or just halfway between the top and bottom edges of the line box?

Up to this point, the spacing between lines (the 'line-height' when the line grid is establish) has been specified. What has not been specified is the offset of the first line of the line gird in the container (page, column or region) in which it is specified. This is specified by a 'first-line-offset' property - which specifies a length that determines where the first line in the grid occurs relative to the start of the element on which the grid is defined. (Note that the question of eliminating the top half-leading on lines that are first in a container comes into this discussion. It is desirable that that half-leading not force the first line down to the second line alignment table just because of half-leading. Perhaps, half-leading (empty space) can be ignored in computing the height of the first line of a container.)

A second desirable feature is to be able to say a given block, typically a heading, is formatted without alignment of its lines, but the block as a whole is aligned, per 2. immediately above, to the line grid. This avoids extra space that might be required if each line in that block, say with a line-height of 1.3 times the line grid line-height, is aligned to the line grid.

Comments are welcome

Steve Zilles

Received on Sunday, 5 February 2012 09:46:04 UTC