Re: Need to clarify the effects of bidi paragraph breaks from Mohamed Mohie on 2010-12-16 (www-style@w3.org from December 2010)

From: Mohamed Mohie <MOHIEM@eg.ibm.com>
Date: Thu, 16 Dec 2010 12:51:39 +0200
To: "Aharon (Vladimir) Lanin" <aharon@google.com>
Cc: fantasai <fantasai.lists@inkedblade.net>, "public-i18n-bidi@w3.org" <public-i18n-bidi@w3.org>, public-i18n-bidi-request@w3.org, W3C style mailing list <www-style@w3.org>
Message-ID: <OF1F09E4DB.DC17749F-ONC22577FB.003B192C-C22577FB.003BA91D@eg.ibm.com>
I support the proposal of not allowing <br> to break the inherited
directionality as Aharon highlighted in the example below:
"OR NOT TO BE?"
In my opinion the above should be displayed as RTL.


Thanks And Best regards,
Mohamed Mohie , PMP®
________________________________________________
GCoC BIDI ,
Advisory Software Engineer, Project Manager, M.Sc.
Cairo Technology Development Center (CTDC) - CMMI L5
IBM Egypt



From:       "Aharon (Vladimir) Lanin" <aharon@google.com>
To:         W3C style mailing list <www-style@w3.org>, fantasai
            <fantasai.lists@inkedblade.net>, "public-i18n-bidi@w3.org"
            <public-i18n-bidi@w3.org>
Date:       16/12/2010 12:13 ص
Subject:    Need to clarify the effects of bidi paragraph breaks
Sent by:    public-i18n-bidi-request@w3.org



Currently, the CSS Writing Modes Module Level 3 spec on text direction
states:

"User agents that support bidirectional text must apply the Unicode
bidirectional algorithm to every sequence of inline boxes uninterrupted by
a forced (bidi class B) line break or block boundary. This sequence forms
the "paragraph" unit in the bidirectional algorithm. The paragraph
embedding level is set according to the value of the ‘direction’ property
of the containing block rather than by the heuristic given in steps P2 and
P3 of the Unicode algorithm."

Further down in the same major section, the definition of
unicode-bidi:plaintext states:

"For the purposes of the Unicode bidirectional algorithm, the base
directionality of each "paragraph" for which the element is the containing
block element is determined not by the element's computed ‘direction’ as
usual, but by following rules P1, P2, and P3 of the Unicode bidirectional
algorithm."

I think that these parts of the spec needs to be tweaked in several
respects:

1. There is no reason to mention rule P1 when describing how
unicode-bidi:plaintext affects the base directionality of each paragraph.
P1 deals with how the text is split up into paragraphs, not with the
direction of each paragraph, and applies to all content, regardless
of unicode-bidi:plaintext.

2. I think it would improve clarity to mention the unicode-bidi:plaintext
exception when first describing how the paragraph embedding level is set
(first quote above). Thus, the last sentence of the first quote should
read:

"The paragraph embedding level is set according to the value of the
‘direction’ property of the containing block, unless the containing block
element has unicode-bidi:plaintext, in which case it is set according to
the heuristic given in steps P2 and P3 of the Unicode algorithm."

3. We must probably explicitly define the effect of a paragraph break (i.e.
a block boundary or bidi class B line break, which in HTML5 includes <br>)
when the path from the containing block element to the paragraph break
includes elements with a unicode-bidi value other than "normal". For
example, what happens when we have (as usual, uppercase English is used
instead of RTL characters) :

<div dir=ltr>
<span dir=rtl>
TO BE<br>
OR NOT TO BE?
</span>
-- hamlet, in rtl translation.
</div>

Should the "OR NOT TO BE?" be displayed in rtl ("?EB OT TON RO") or in ltr
("EB OT TON RO?")?

While it seems obvious that it should be displayed in RTL because it is
part of a <span dir=rtl>, that is not the result if we simply translate the
above into Unicode bidi formatting characters, i.e.

[RLE]TO BE
OR NOT TO BE?[PDF] -- hamlet, in rtl translation.

The overall direction of both paragraphs is ltr (P2 and P3 are overridden),
and since the paragraph break resets all embedding levels, the [PDF] is
orphaned, and the question mark winds up to the right of "EB OT TON RO".

I believe that the correct approach to take is to treat the second bidi
paragraph (i.e. "TO BE ... translation.") the same as:

<div dir=ltr>
<span dir=rtl>
OR NOT TO BE?
</span>
-- hamlet, in rtl translation.
</div>

In other words, while the paragraph's overall level should be set according
to the value of the ‘direction’ property of the containing block (ltr), it
should be opened by repeating the embeddings or overrides introduced by the
elements between the paragraph break and the containing block - in our
example, the equivalent of an RLE (which is then matched by the </span>'s
PDF equivalent).

This is similar to the CSS specs for anonymous block boxes, i.e:

"When an inline box contains a block box, the inline box (and its inline
ancestors within the same line box) are broken around the block. The line
boxes before the break and after the break are enclosed in anonymous boxes,
and the block box becomes a sibling of those anonymous boxes. When such an
inline box is affected by relative positioning, the relative positioning
also affects the block box."

"The properties of anonymous boxes are inherited from the enclosing
non-anonymous box".

Does a line break does result in anonymous boxes? If not, we certainly need
something in the Writing Modes spec. Actually, it would be good to have it
either anyway, just to clarify things.

4. When the path from the containing block element to the paragraph break
includes an element with unicode-bidi:isolate, there is no reason to go
back all the way to the containing block element to get the new paragraph's
base direction and the embeddings to be reconstituted at its start. Instead
of referring to the containing block element, the spec should be referring
to the closest unicode-bidi:isolate ancestor or containing block element,
whichever is closer.

Aharon
Received on Thursday, 16 December 2010 10:52:53 UTC