Need to clarify the effects of bidi paragraph breaks from Aharon (Vladimir) Lanin on 2010-12-15 (www-style@w3.org from December 2010)

From: Aharon (Vladimir) Lanin <aharon@google.com>
Date: Thu, 16 Dec 2010 00:11:36 +0200
To: W3C style mailing list <www-style@w3.org>, fantasai <fantasai.lists@inkedblade.net>, "public-i18n-bidi@w3.org" <public-i18n-bidi@w3.org>
Message-ID: <AANLkTime9j6ihPJR_XLOjJ_muDxV5H25a3ZXzSMbiJPJ@mail.gmail.com>
Currently, the CSS Writing Modes Module Level 3 spec on text
direction<http://dev.w3.org/csswg/css3-writing-modes/#text-direction>
 states:

"User agents that support bidirectional text must apply the Unicode
bidirectional algorithm to every sequence of inline boxes uninterrupted by a
forced (bidi class B) line break or block boundary. This sequence forms the
"paragraph" unit in the bidirectional algorithm. The paragraph embedding
level is set according to the value of the ‘direction’ property of the
containing block rather than by the heuristic given in steps P2 and P3 of
the Unicode algorithm."

Further down in the same major section, the definition of
unicode-bidi:plaintext<http://dev.w3.org/csswg/css3-writing-modes/#unicode-bidi>
 states:

"For the purposes of the Unicode bidirectional algorithm, the base
directionality of each "paragraph" for which the element is the containing
block element is determined not by the element's computed ‘direction’ as
usual, but by following rules P1, P2, and P3 of the Unicode bidirectional
algorithm."

I think that these parts of the spec needs to be tweaked in several
respects:

1. There is no reason to mention rule P1 when describing how
unicode-bidi:plaintext affects the base directionality of each paragraph. P1
deals with how the text is split up into paragraphs, not with the direction
of each paragraph, and applies to all content, regardless
of unicode-bidi:plaintext.

2. I think it would improve clarity to mention the unicode-bidi:plaintext
exception when first describing how the paragraph embedding level is set
(first quote above). Thus, the last sentence of the first quote should read:

"The paragraph embedding level is set according to the value of the
‘direction’ property of the containing block, unless the containing block
element has unicode-bidi:plaintext, in which case it is set according to the
heuristic given in steps P2 and P3 of the Unicode algorithm."

3. We must probably explicitly define the effect of a paragraph break (i.e.
a block boundary or bidi class B line break, which in HTML5 includes <br>)
when the path from the containing block element to the paragraph break
includes elements with a unicode-bidi value other than "normal". For
example, what happens when we have (as usual, uppercase English is used
instead of RTL characters) :

<div dir=ltr>
<span dir=rtl>
TO BE<br>
OR NOT TO BE?
</span>
-- hamlet, in rtl translation.
</div>

Should the "OR NOT TO BE?" be displayed in rtl ("?EB OT TON RO") or in ltr
("EB OT TON RO?")?

While it seems obvious that it should be displayed in RTL because it is part
of a <span dir=rtl>, that is not the result if we simply translate the above
into Unicode bidi formatting characters, i.e.

[RLE]TO BE
OR NOT TO BE?[PDF] -- hamlet, in rtl translation.

The overall direction of both paragraphs is ltr (P2 and P3 are overridden),
and since the paragraph break resets all embedding levels, the [PDF] is
orphaned, and the question mark winds up to the right of "EB OT TON RO".

I believe that the correct approach to take is to treat the second bidi
paragraph (i.e. "TO BE ... translation.") the same as:

<div dir=ltr>
<span dir=rtl>
OR NOT TO BE?
</span>
-- hamlet, in rtl translation.
</div>

In other words, while the paragraph's overall level should be set according
to the value of the ‘direction’ property of the containing block (ltr), it
should be opened by repeating the embeddings or overrides introduced by the
elements between the paragraph break and the containing block - in our
example, the equivalent of an RLE (which is then matched by the </span>'s
PDF equivalent).

This is similar to the CSS specs for anonymous block
boxes<http://www.w3.org/TR/2009/CR-CSS2-20090908/visuren.html#anonymous-block-level>,
i.e:

"When an inline box contains a block box, the inline box (and its inline
ancestors within the same line box) are broken around the block. The line
boxes before the break and after the break are enclosed in anonymous boxes,
and the block box becomes a sibling of those anonymous boxes. When such an
inline box is affected by relative positioning, the relative positioning
also affects the block box."

"The properties of anonymous boxes are inherited from the enclosing
non-anonymous box".

Does a line break does result in anonymous boxes? If not, we certainly need
something in the Writing Modes spec. Actually, it would be good to have it
either anyway, just to clarify things.

4. When the path from the containing block element to the paragraph break
includes an element with unicode-bidi:isolate, there is no reason to go back
all the way to the containing block element to get the new paragraph's base
direction and the embeddings to be reconstituted at its start. Instead of
referring to the containing block element, the spec should be referring to
the closest unicode-bidi:isolate ancestor or containing block element,
whichever is closer.

Aharon
Received on Wednesday, 15 December 2010 22:12:27 UTC