Re: Need to clarify the effects of bidi paragraph breaks from Alan Gresley on 2010-12-16 (www-style@w3.org from December 2010)

From: Alan Gresley <alan@css-class.com>
Date: Thu, 16 Dec 2010 16:01:51 +1100
To: "Aharon (Vladimir) Lanin" <aharon@google.com>
CC: W3C style mailing list <www-style@w3.org>, fantasai <fantasai.lists@inkedblade.net>, "public-i18n-bidi@w3.org" <public-i18n-bidi@w3.org>
Message-ID: <4D099D3F.3010201@css-class.com>
On 16/12/2010 9:11 AM, Aharon (Vladimir) Lanin wrote:

Adding my 2 cents worth. I slowly understanding the concept of 
bi-directionally. I have trouble since I can only read and write English.


> Currently, the CSS Writing Modes Module Level 3 spec on text
> direction<http://dev.w3.org/csswg/css3-writing-modes/#text-direction>
>   states:
>
> "User agents that support bidirectional text must apply the Unicode
> bidirectional algorithm to every sequence of inline boxes uninterrupted by a
> forced (bidi class B) line break or block boundary.


I think this is referring to a class B line break (whatever that is).

<br/> seem to come at 3.4 (Reordering Resolved Levels) [1] and what is 
called Paragraph separators.


>  This sequence forms the
> "paragraph" unit in the bidirectional algorithm. The paragraph embedding
> level is set according to the value of the ‘direction’ property of the
> containing block rather than by the heuristic given in steps P2 and P3 of
> the Unicode algorithm."
>
> Further down in the same major section, the definition of
> unicode-bidi:plaintext<http://dev.w3.org/csswg/css3-writing-modes/#unicode-bidi>
>   states:
>
> "For the purposes of the Unicode bidirectional algorithm, the base
> directionality of each "paragraph" for which the element is the containing
> block element is determined not by the element's computed ‘direction’ as
> usual, but by following rules P1, P2, and P3 of the Unicode bidirectional
> algorithm."


Above I see "which the element." I have know idea what element is being 
referred to here. This paragraph also seems to suggest an added meaning 
of a containing block. What is a containing block element?


> I think that these parts of the spec needs to be tweaked in several
> respects:
>
> 1. There is no reason to mention rule P1 when describing how
> unicode-bidi:plaintext affects the base directionality of each paragraph. P1
> deals with how the text is split up into paragraphs, not with the direction
> of each paragraph, and applies to all content, regardless
> of unicode-bidi:plaintext.
>
> 2. I think it would improve clarity to mention the unicode-bidi:plaintext
> exception when first describing how the paragraph embedding level is set
> (first quote above). Thus, the last sentence of the first quote should read:
>
> "The paragraph embedding level is set according to the value of the
> ‘direction’ property of the containing block, unless the containing block
> element has unicode-bidi:plaintext, in which case it is set according to the
> heuristic given in steps P2 and P3 of the Unicode algorithm."
>
> 3. We must probably explicitly define the effect of a paragraph break (i.e.
> a block boundary or bidi class B line break, which in HTML5 includes<br>)
> when the path from the containing block element to the paragraph break
> includes elements with a unicode-bidi value other than "normal". For
> example, what happens when we have (as usual, uppercase English is used
> instead of RTL characters) :
>
> <div dir=ltr>
> <span dir=rtl>
> TO BE<br>
> OR NOT TO BE?
> </span>
> -- hamlet, in rtl translation.
> </div>
>
> Should the "OR NOT TO BE?" be displayed in rtl ("?EB OT TON RO") or in ltr
> ("EB OT TON RO?")?


That believe this depends on the value of unicode-bidi. I am somewhat 
confused myself since the default behavior in an offline test,

<!DOCTYPE html>
<div dir=ltr>
<span dir=rtl>
TO BE<br>
  OR NOT TO BE?
</span>
<div>-- hamlet, in rtl translation.</div>
</div>

  in FF 3.6.13 renders as embed where the initial value for unicode-bidi 
is normal.


    unicode-bidi: embed, isolate and plaintext produces this.

   ?OR NOT TO BE


    unicode-bidi: normal produces this.

   OR NOT TO BE?


    unicode-bidi: bidi-override    produces this.

   ?EB OT TON RO


I have not tested in other browser since I am ignorant if FF even does 
it correctly.


> While it seems obvious that it should be displayed in RTL because it is part
> of a<span dir=rtl>, that is not the result if we simply translate the above
> into Unicode bidi formatting characters, i.e.
>
> [RLE]TO BE
> OR NOT TO BE?[PDF] -- hamlet, in rtl translation.


The direction does not affect the embedding algorithm of a particular 
script. The direction changes where the start and end is for a sequence 
of inline boxes. The placement of punctuation marks (.,;?!`), makers for 
list (with value of outside) is changed due to direction.


> The overall direction of both paragraphs is ltr (P2 and P3 are overridden),
> and since the paragraph break resets all embedding levels, the [PDF] is
> orphaned, and the question mark winds up to the right of "EB OT TON RO".
>
> I believe that the correct approach to take is to treat the second bidi
> paragraph (i.e. "TO BE ... translation.") the same as:
>
> <div dir=ltr>
> <span dir=rtl>
> OR NOT TO BE?
> </span>
> -- hamlet, in rtl translation.
> </div>
>
> In other words, while the paragraph's overall level should be set according
> to the value of the ‘direction’ property of the containing block (ltr), it
> should be opened by repeating the embeddings or overrides introduced by the
> elements between the paragraph break and the containing block - in our
> example, the equivalent of an RLE (which is then matched by the</span>'s
> PDF equivalent).
>
> This is similar to the CSS specs for anonymous block
> boxes<http://www.w3.org/TR/2009/CR-CSS2-20090908/visuren.html#anonymous-block-level>,
> i.e:
>
> "When an inline box contains a block box, the inline box (and its inline
> ancestors within the same line box) are broken around the block. The line
> boxes before the break and after the break are enclosed in anonymous boxes,
> and the block box becomes a sibling of those anonymous boxes. When such an
> inline box is affected by relative positioning, the relative positioning
> also affects the block box."
>
> "The properties of anonymous boxes are inherited from the enclosing
> non-anonymous box".
>
> Does a line break does result in anonymous boxes? If not, we certainly need
> something in the Writing Modes spec. Actually, it would be good to have it
> either anyway, just to clarify things.
>
> 4. When the path from the containing block element to the paragraph break
> includes an element with unicode-bidi:isolate, there is no reason to go back
> all the way to the containing block element to get the new paragraph's base
> direction and the embeddings to be reconstituted at its start. Instead of
> referring to the containing block element, the spec should be referring to
> the closest unicode-bidi:isolate ancestor or containing block element,
> whichever is closer.
>
> Aharon


I believe the spec needs quite a few illustrations. If an author is 
given a job where there are runs of LTR and RTL text and they only 
understand one language, the spec as it is is not going to help.

I also believe that the spec should give particular examples of foreign 
script of words that can easy be recognized. My use of  ᠨᠶᠪᠧᠺᠴᡗ  here 
[2] does not help me. Only with research did I figure that is ran LTR,


1. <http://www.unicode.org/reports/tr9/#Reordering_Resolved_Levels>
2. <http://css-class.com/test/css/bidi/mongolian-test1-extra.htm>


-- 
Alan http://css-class.com/

Armies Cannot Stop An Idea Whose Time Has Come. - Victor Hugo
Received on Thursday, 16 December 2010 05:02:29 UTC