response to Comment on whitespace handling


Thanks for your continued interest in XSL and specifically
getting to the bottom of whitespace handling issues.

Needless to say, this topic has taken a lot of resources and 
time to investigate and (hopefully) resolve.  Anyone who
has ever worked with SGML, XML, and/or stylesheets knows how
tricky "white space issues" can be.

Your comment at
prompted a discussion within the FO Subgroup, highlighting an 
imperfect overlap between the Line-building section and the 
white-space-treatment and suppress-at-line-break properties. 

We decided to move the processing of the white-space-treatment 
property into section 4.7.2 (Line-building) and to modify the 
properties slightly for greater flexibility going forward, and 
to rescind the erratum that created a new property relating to 
suppress-at-line-break, which is now unnecessary.

We did not find it necessary to move the processing of the 
white-space-collapse or linefeed-treatment properties.

The newly reworded 4.7.2 and the rewritten property definitions 7.15.8 
and 7.16.3 [below] now clarify the relations of these properties to 
Line-building. Some whitespace handling happens in refinement and some 
happens in area generation, and this change moves some processing that 
was happening in refinement (processing of the white-space-treatment and 
suppress-at-line-break properties) into area generation (4.7.2 line-building).

We believe this is the best way to address the complexities of
whitespace handling and allows for implementations and users to
get all the control they need.


4.7.2 Line-building

This section describes the ordering constraints that apply to formatting an 
fo:block or similar block-level object.

A block-level formatting object F which constructs lines does so by 
constructing block-areas which it returns to its parent formatting object, 
and placing normal areas and/or anchor areas returned to F by its child 
formatting objects as children of those block-areas or of line-areas which 
it constructs as children of those block-areas.

For each such formatting object F, it must be possible to form an ordered 
partition P consisting of ordered subsets S1, S2, ..., Sn of the normal 
areas and anchor areas returned by the child formatting objects, such that 
the following are all satisfied:

1. Each subset consists of a sequence of inline-areas, or of a single block-area.

2. The ordering of the partition follows the ordering of the formatting object 
tree. Specifically, if A is in Si and B is in Sj with i < j, or if A and B 
are both in the same subset Si with A before B in the subset order, then 
either A is returned by a preceding sibling formatting object of B, or A and 
B are returned by the same formatting object with A being returned before B.

3. The partitioning occurs at legal line-breaks. Specifically, if A is the last 
area of Si and B is the first area of Si+1, then the rules of the language 
and script in effect must permit a line-break between A and B, within the 
context of all areas in Si and Si+1.

4. Forced line-breaks are respected. Specifically, if C is a descendant of F, 
and C is a fo:character whose Unicode character is U+000A, and A is the area 
generated by C, then either C is a child of F and A is the last area in a 
subset Si, or C is a descendant of a child C' of F, and A ends (in the sense 
of 4.2.5) an area A' returned by C' , such that A' is the last area in a 
subset Si.

5. The partition follows the ordering of the area tree, except for certain 
glyph substitutions and deletions. Specifically, if B1, B2, ..., Bp are the 
normal child areas of the area or areas returned by F, (ordered in the 
pre-order traversal order of the area tree), then there is a one-to-one 
correspondence between these child areas and the partition subsets (i.e. n = 
p), and for each i, 

  * Si consists of a single block-area and Bi is that block-area, or

  * Si consists of inline-areas and Bi is a line-area whose child areas are the 
    same as the inline-areas in Si, and in the same order, except that where the 
    rules of the language and script in effect call for glyph-areas to be 
    substituted, inserted, or deleted, then the substituted or inserted 
    glyph-areas appear in the area tree in the corresponding place, and the 
    deleted glyph-areas do not appear in the area tree. For example, insertions 
    and substitutions may occur because of addition of hyphens or spelling 
    changes due to hyphenation, or glyph image construction from 
    syllabification, or ligature formation. Deletions occur as specified in 
    (6.), below. 

6. white-space-treatment is enforced. In particular, deletions in (5.) occur 
when there is a glyph area G such that 

   (a.) the white-space-treatment of G is "ignore" and the character of G is 
         classified as white space in XML; or

   (b.) the white-space-treatment of G is "ignore-if-before-linefeed" or 
        "ignore-if-surrounding-linefeed", the suppress-at-line-break of G is 
        "suppress", and G would end a line-area; or

   (c.) the white-space-treatment of G is "ignore-if-after-linefeed" or 
        "ignore-if-surrounding-linefeed", the suppress-at-line-break of G is 
        "suppress", and G would begin a line-area.

  In these cases the area G is deleted; this may cause the condition in 
  clauses (b.) or (c.) to become true and lead to further deletions.

Substitutions that replace a sequence of glyph-areas with a single 
glyph-area should only occur when the margin, border, and padding in the 
inline-progression-direction (start- and end-), baseline-shift, and 
letter-spacing values are zero, treat-as-word-space is false, and the values 
of all other relevant traits match (i.e., alignment-adjust, 
alignment-baseline, color trait, background traits, 
dominant-baseline-identifier, font traits, text-depth, text-altitude, 
glyph-orientation-horizontal, glyph-orientation-vertical, line-height, 
line-height-shift-adjustment, text-decoration, text-shadow).


Line-areas do not receive the background traits or text-decoration of their 
generating formatting object, or any other trait that requires generation of 
a mark during rendering.


7.15.8 white-space-treatment

The values have the following meanings:

   Any glyph-area whose Unicode character is classified as white space in
XML, except for U+000A, shall be deleted during line-building and
inline-building (see 4.1.6 and 4.2.6).

   Any glyph-area whose Unicode character is classified as white space in
XML shall not be deleted during line-building and inline-building.

   Any glyph-area with a suppress-at-line-break value of 'suppress' shall
be deleted during line-building and inline-building if it would be the last
glyph-area descendant of a line-area.

   Any glyph-area with a suppress-at-line-break value of 'suppress' shall
be deleted during line-building and inline-building if it would be the
first glyph-area descendant of a line-area

   Any glyph-area with a suppress-at-line-break value of 'suppress' shall
be deleted during line-building and inline-building if it would be the
first or last glyph-area descendant of a line-area


7.16.3 suppress-at-line-break

The property has the following values:

the value is determined by the Unicode value of object's character
property. The character at code point U+0020 is treated as if 'suppress'
had been specified. All other characters are treated as if 'retain' had
been specified.


The glyph area generated by the fo:character is eligible to be suppressed
at the start or end of a line-area depending on the white-space-treatement
property. (q.v.)

The glyph area generated by the fo:character shall be placed in the area
tree whether or not it first or last in a line-area.


Paul Grosso for the XSL FO Subgroup of the XSL WG

Received on Thursday, 7 August 2003 11:47:35 UTC