W3C home > Mailing lists > Public > public-i18n-bidi@w3.org > January to March 2012

Re: Alignment of paragraphs with unicode-bidi: plaintext

From: Aharon (Vladimir) Lanin <aharon@google.com>
Date: Mon, 20 Feb 2012 14:58:29 +0200
Message-ID: <CA+FsOYb4rshgsvWsFfBZRJkiRNPWVctj15PoyU=fSxnXi8=giA@mail.gmail.com>
To: fantasai <fantasai.lists@inkedblade.net>, public-i18n-bidi@w3.org, Simon Montagu <smontagu@smontagu.org>, Levi Weintraub <leviw@google.com>
The following is too long, but please do read. In brief, I think that both
the current unicode-bidi:plaintext and the proposed alignment specs have
serious problems. On the up side, I also suggest solutions :-). The
problems in unicode-bidi:plaintext are significant even when we ignore
alignment, and the modification I suggest seems to be closer to what is
implemented in both Firefox and WebKit.

In a nutshell, the current definition of unicode-bidi:plaintext in
http://dev.w3.org/csswg/css3-writing-modes/#unicode-bidi is unclear about
how unicode-bidi:plaintext is supposed to behave when it is in effect on an
inline element. Please note that originally, e.g. in
http://www.w3.org/TR/2011/WD-css3-writing-modes-20110428/, the definition
said that "this value has no effect on inline elements". And later, in
http://www.w3.org/TR/2011/WD-css3-writing-modes-20110531/, it said that
"for inline elements, this value is equivalent to ‘isolate’". This put them
in a separate bidi paragraph, but did *not* determine this paragraph's base
directionality. The current definition, however, says the following:

For the purposes of the Unicode bidirectional algorithm, the base
directionality of each bidi paragraph for which the element forms the
containing block is determined not by the element's computed ‘direction’ as
usual, but by following the heuristic in rules P2 and P3 of the Unicode
bidirectional algorithm. For inline elements, this value behaves as for
‘isolate’, except, as with block containers,* *the base directionality is
determined by following the Unicode heuristic instead of by using the
‘direction’ value.


This gives no clue to the following question: for which paragraphs,
exactly, does a unicode-bidi:plaintext element that is not a containing
block for anything determine the base directionality?

Given that the definition says that such elements are to behave as
for unicode-bidi:isolate, we do at least know that paragraphs never
straddle the boundary of a unicode-bidi:plaintext element. So, should
the unicode-bidi:plaintext on an element set the base directionality of
*all* paragraphs contained by the element? Should it apply to:

- Both paragraphs in <span "style=unicode-bidi:plaintext">Line 1.<br/>שורה
2.</span>?
- All three paragraphs in <span "style=unicode-bidi: plaintext">Line
1.<span style="display:block">שורה 2.</span>שורה 3.</span>?
- Both paragraphs in <span "style=unicode-bidi: plaintext">He said '<span
style="unicode-bidi:isolate">שלום!</span>'.</span>?

If it applies to all these cases, we have a strange contradiction. For
elements that *are* a containing block, unicode-bidi:plaintext only sets
the base directionality of paragraphs for which it is the containing block
- not all paragraphs it contains. But elements that are not a containing
block are paradoxically supposed to be "stronger", determining the
directionality of all the paragraphs they contain. IMO, this makes no sense.

For example, consider:

<div style="unicode-bidi:plaintext">Line 1.<div>שורה 2!</div>Line 3.</div>

The inside div defines its own paragraph, "שורה 2!". Obviously, this
paragraph is contained by the outside div. However, the outside div is not
its containing block (that's the inside div, which does not have
unicode-bidi:plaintext.) So, should its base directionality be determined
by the Unicode heuristic, and thus be displayed RTL, as "!שורה 2"?
 According to the current definition, it shouldn't be. Nor is it in the
current Firefox and WebKit implementations. And, IMO, the current
definition and implementations are good in this respect: if you want
plaintext auto-direction on the inside div, then give it
unicode-bidi:plaintext explicitly.

If so, however, the unicode-bidi:plaintext heuristic also should not apply
to the paragraph in the inside span in

<div dir="ltr"><span "style=unicode-bidi:plaintext">Line 1.<span
style="display:block">שורה 2!</span>שורה 3.</span></div>.

That is, "שורה 2!" should be displayed LTR by inheritance from the div, not
RTL by the unicode-bidi:plaintext on the outside span, even though the
outside span contains it.

To reflect this, I propose that the definition of unicode-bidi:plaintext be
modified as follows:

plaintext
This value behaves as for ‘isolate’, except that for the purposes of the
Unicode bidirectional algorithm, the base directionality of each bidi
paragraph *immediately contained* by the element is determined not by the
element's computed ‘direction’ as usual, but by following the heuristic in
rules P2 and P3 of the Unicode bidirectional algorithm. *A paragraph is
immediately contained by an element if it is contained by it, but is not
contained by a descendant element that puts its content into a separate
bidi paragraph (or paragraphs)*, e.g. an element with display:block,
position:absolute, unicode-bidi:isolate, unicode-bidi:plaintext, and so on.


Please note that this gets rid of the reliance on containing blocks,
replacing it instead with the notion of the "immediately containing"
element for a given bidi paragraph. It then makes no difference whether the
element is inline or a block.

Thus, for both

<div style="unicode-bidi:plaintext">Line 1.<div>שורה 2!</div>Line 3.</div>

and

<span "style=unicode-bidi:plaintext">Line 1.<span
style="display:block">שורה 2!</span>שורה 3.</span>

the directionality of the "שורה 2!" paragraph is not determined by the
outside element's unicode-bidi:plaintext. Nor is the outside
element's unicode-bidi:plaintext applied to the isolated element in either

<span "style=unicode-bidi: plaintext">He said '<span
style="unicode-bidi:isolate">שלום!</span>'.</span>

or

<div dir="rtl" "style=unicode-bidi: plaintext">He said '<span
class="isolate">hello!</span>'.</div>

Please note that, in fact, both Firefox and WebKit display the last example
with the "hello!" paragraph in RTL, as inherited from the div's direction,
not in LTR as demanded by the div's unicode-bidi:plaintext. This is
especially significant given that, strictly speaking, according to the
current definition unicode-bidi:plaintext, the
div's unicode-bidi:plaintext *should* apply to the "hello!" paragraph,
since the div is the containing block for it. This gives me an additional
indication that the modified definition is, in fact, better than the
current one.

Now, let's go on to alignment. Here is the current definition:

The start and end edges of line boxes are determined by the inline base
direction per "paragraph", where in this case the "paragraph" is all
consecutive line boxes not separated by a forced line break or block
boundary. In most cases, this means referring to the ‘direction’ property
of the containing block. In the case of ‘unicode-bidi: plaintext’, however,
this uses the implied inline base direction of the "paragraph" (i.e. the
based direction that is used for bidi reordering).


There are a couple of issues here.

- What exactly does "In the case of ‘unicode-bidi: plaintext’" mean? I
think it means the case where the element whose text-align (or
text-align-last) we are trying to apply also has unicode-bidi:plaintext.

- The definition assumes that a "paragraph" can contain several line boxes,
but not the other way around. This is not true. Take, for example, the
following:

<div dir=ltr>I said '<span style="unicode-bidi:plaintext">שלום!</span>' and
he said '<span style="unicode-bidi:plaintext">hello!</span>'.</div>

This contains three paragraphs:

- "I said '*' and he said '*'."
This is LTR because of the dir=ltr on the div.

- "שלום!"
This is RTL because of the unicode-bidi:plaintext on the first span.

- "hello!"
This is LTR because of the unicode-bidi:plaintext on the second span.

 All three paragraphs, however, are in a single line box. So, how should
alignment work when a line box contains several paragraphs?

I propose that just as an element's unicode-bidi:plaintext only affects the
directionality of paragraphs that the element immediately contains, so too
it is *only these* paragraphs' directionality that should affect the
alignment of the line boxes in the element when the element is a containing
block.

That is, let's note that if two distinct paragraphs are immediately
contained by the same element, they can not share a line box. (If they did,
they would have to be a single paragraph.) Thus, each line box in a
containing block belongs to exactly one of the block's immediately
contained paragraphs.

Now, the amended definition for alignment:

 The start and end edges of a line box are relative to a direction
determined as follows. If a line box's containing block has
unicode-bidi:plaintext, use the base directionality of the containing
block's immediately contained paragraph to which the line box belongs.
Otherwise, use the containing block's computed direction.


Please note that this means that unicode-bidi:plaintext does not affect
alignment except when applied to an element that is a containing block.

Let's take an example:

<div dir=rtl><div style="unicode-bidi:plaintext; text-align:start">He said
'<span style="unicode-bidi:isolate">שלום!</span>'.</div></div>

We have here a line box containing two paragraphs:

- "שלום!"
This is RTL by inheritance from the (outer) div. It is immediately
contained by the span. Thus, given that the line box's containing block is
the (inner) div, not the span, the line box does not belong to this
paragraph.

- "He said '*'."
This is the paragraph immediately contained by the div
with unicode-bidi:plaintext. The line box belongs to it. This paragraph is
LTR by the Unicode heuristic. Note that it would have been LTR even if the
span were moved to its very beginning, e.g. <div
style="unicode-bidi:plaintext; text-align:start">'<span
style="unicode-bidi:isolate">שלום!</span>', he said.</div>. Since the line
box belongs to it, and it is LTR, the line box is aligned left.

Now, another example:

<div dir=rtl><div style="unicode-bidi:plaintext; text-align:start"><span
style="unicode-bidi:isolate">שלום!</span></div></div>

We still have here a line box containing two paragraphs:

- "שלום!"
As before, this is RTL by inheritance from the (outer) div. It is
immediately contained by the span, and thus the line box does not belong to
it.

- "*"
This is the paragraph immediately contained by the div
with unicode-bidi:plaintext. The line box belongs to it. This paragraph is
all-neutral, and thus LTR by the Unicode heuristic. Since the line box
belongs to it, and it is LTR, the line box is aligned left.

And finally, let's take an example like the one in the current spec. As you
will see, my expectations differ from those currently stated in the spec:

<div dir=ltr style="white-space:
pre; text-align:start; unicode-bidi:plaintext">
He said: <span style="unicode-bidi:plaintext">שלום!
How are you?
להתראות.</span>
</div>

There are four paragraphs:

- "He said: *"
This is LTR due to the div's direction. It is immediately contained by the
div.

- "שלום!"
This is RTL due to the Unicode heuristic on the span, which immediately
contains it, so it will be displayed "!שלום"

- "How are you?"
This is LTR due to the Unicode heuristic on the span, which immediately
contains it.

- "להתראות."
This is RTL due to the Unicode heuristic on the span, which immediately
contains it, so it will be displayed ".להתראות"

There are three line boxes. *All three belong to the "He said: *"
paragraph.* That's because it is the only paragraph immediately contained
by the div, the line boxes' containing block. Thus, all three are aligned
left.

If the example had omitted the span, the third line box would have been
aligned right.

Aharon

On Fri, Feb 17, 2012 at 7:42 PM, fantasai <fantasai.lists@inkedblade.net>wrote:

> On 10/30/2011 09:28 PM, Simon Montagu wrote:
>
>> As far as I can see, there is no explicit specification in CSS Writing
>> Modes Module Level 3 of what effect "unicode-bidi:
>> plaintext" should have on the default alignment of paragraphs.
>>
>> When implementing "unicode-bidi: plaintext" for Gecko, I took it for
>> granted that each paragraph in the element would
>> determine its directionality by the heuristic in the UBA, and then
>> determine the start of the line box depending on the
>> directionality of the paragraph.
>>
>> I just noticed that recent versions of Chrome behave differently:
>> directionality is determined for each paragraph separately,
>> but alignment is determined by the first paragraph in the element, and
>> all subsequent paragraphs get the same alignment.
>>
>> As I said, there doesn't seem to be anything in the spec to say which
>> approach is correct. I think the behaviour in Gecko is
>> more intuitive and useful, but then I would, wouldn't I? Either way, it
>> is probably worth adding something to the spec to make
>> it explicit.
>>
>
> Fixed in the spec, per Aharon's recommendation:
>  http://dev.w3.org/csswg/css3-**text/#text-align<http://dev.w3.org/csswg/css3-text/#text-align>
>
>  # The start and end edges of line boxes are determined by the inline
>  # base direction per "paragraph", where in this case the "paragraph"
>  # is all consecutive line boxes not separated by a forced line break
>  # or block boundary. In most cases, this means referring to the
>  # ‘direction’ property of the containing block. In the case of
>  # ‘unicode-bidi: plaintext’, however, this uses the implied inline
>  # base direction of the "paragraph" (i.e. the based direction that
>  # is used for bidi reordering).
>
> And there's an example afterward.
>
> Simon, can you look this over and let me know if it matches your
> implementation?
>
> ~fantasai
>
>
Received on Monday, 20 February 2012 12:59:21 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 20 February 2012 12:59:22 GMT