- From: Aharon (Vladimir) Lanin <aharon@google.com>
- Date: Mon, 20 Feb 2012 14:58:29 +0200
- To: fantasai <fantasai.lists@inkedblade.net>, public-i18n-bidi@w3.org, Simon Montagu <smontagu@smontagu.org>, Levi Weintraub <leviw@google.com>
- Message-ID: <CA+FsOYb4rshgsvWsFfBZRJkiRNPWVctj15PoyU=fSxnXi8=giA@mail.gmail.com>
The following is too long, but please do read. In brief, I think that both the current unicode-bidi:plaintext and the proposed alignment specs have serious problems. On the up side, I also suggest solutions :-). The problems in unicode-bidi:plaintext are significant even when we ignore alignment, and the modification I suggest seems to be closer to what is implemented in both Firefox and WebKit. In a nutshell, the current definition of unicode-bidi:plaintext in http://dev.w3.org/csswg/css3-writing-modes/#unicode-bidi is unclear about how unicode-bidi:plaintext is supposed to behave when it is in effect on an inline element. Please note that originally, e.g. in http://www.w3.org/TR/2011/WD-css3-writing-modes-20110428/, the definition said that "this value has no effect on inline elements". And later, in http://www.w3.org/TR/2011/WD-css3-writing-modes-20110531/, it said that "for inline elements, this value is equivalent to ‘isolate’". This put them in a separate bidi paragraph, but did *not* determine this paragraph's base directionality. The current definition, however, says the following: For the purposes of the Unicode bidirectional algorithm, the base directionality of each bidi paragraph for which the element forms the containing block is determined not by the element's computed ‘direction’ as usual, but by following the heuristic in rules P2 and P3 of the Unicode bidirectional algorithm. For inline elements, this value behaves as for ‘isolate’, except, as with block containers,* *the base directionality is determined by following the Unicode heuristic instead of by using the ‘direction’ value. This gives no clue to the following question: for which paragraphs, exactly, does a unicode-bidi:plaintext element that is not a containing block for anything determine the base directionality? Given that the definition says that such elements are to behave as for unicode-bidi:isolate, we do at least know that paragraphs never straddle the boundary of a unicode-bidi:plaintext element. So, should the unicode-bidi:plaintext on an element set the base directionality of *all* paragraphs contained by the element? Should it apply to: - Both paragraphs in <span "style=unicode-bidi:plaintext">Line 1.<br/>שורה 2.</span>? - All three paragraphs in <span "style=unicode-bidi: plaintext">Line 1.<span style="display:block">שורה 2.</span>שורה 3.</span>? - Both paragraphs in <span "style=unicode-bidi: plaintext">He said '<span style="unicode-bidi:isolate">שלום!</span>'.</span>? If it applies to all these cases, we have a strange contradiction. For elements that *are* a containing block, unicode-bidi:plaintext only sets the base directionality of paragraphs for which it is the containing block - not all paragraphs it contains. But elements that are not a containing block are paradoxically supposed to be "stronger", determining the directionality of all the paragraphs they contain. IMO, this makes no sense. For example, consider: <div style="unicode-bidi:plaintext">Line 1.<div>שורה 2!</div>Line 3.</div> The inside div defines its own paragraph, "שורה 2!". Obviously, this paragraph is contained by the outside div. However, the outside div is not its containing block (that's the inside div, which does not have unicode-bidi:plaintext.) So, should its base directionality be determined by the Unicode heuristic, and thus be displayed RTL, as "!שורה 2"? According to the current definition, it shouldn't be. Nor is it in the current Firefox and WebKit implementations. And, IMO, the current definition and implementations are good in this respect: if you want plaintext auto-direction on the inside div, then give it unicode-bidi:plaintext explicitly. If so, however, the unicode-bidi:plaintext heuristic also should not apply to the paragraph in the inside span in <div dir="ltr"><span "style=unicode-bidi:plaintext">Line 1.<span style="display:block">שורה 2!</span>שורה 3.</span></div>. That is, "שורה 2!" should be displayed LTR by inheritance from the div, not RTL by the unicode-bidi:plaintext on the outside span, even though the outside span contains it. To reflect this, I propose that the definition of unicode-bidi:plaintext be modified as follows: plaintext This value behaves as for ‘isolate’, except that for the purposes of the Unicode bidirectional algorithm, the base directionality of each bidi paragraph *immediately contained* by the element is determined not by the element's computed ‘direction’ as usual, but by following the heuristic in rules P2 and P3 of the Unicode bidirectional algorithm. *A paragraph is immediately contained by an element if it is contained by it, but is not contained by a descendant element that puts its content into a separate bidi paragraph (or paragraphs)*, e.g. an element with display:block, position:absolute, unicode-bidi:isolate, unicode-bidi:plaintext, and so on. Please note that this gets rid of the reliance on containing blocks, replacing it instead with the notion of the "immediately containing" element for a given bidi paragraph. It then makes no difference whether the element is inline or a block. Thus, for both <div style="unicode-bidi:plaintext">Line 1.<div>שורה 2!</div>Line 3.</div> and <span "style=unicode-bidi:plaintext">Line 1.<span style="display:block">שורה 2!</span>שורה 3.</span> the directionality of the "שורה 2!" paragraph is not determined by the outside element's unicode-bidi:plaintext. Nor is the outside element's unicode-bidi:plaintext applied to the isolated element in either <span "style=unicode-bidi: plaintext">He said '<span style="unicode-bidi:isolate">שלום!</span>'.</span> or <div dir="rtl" "style=unicode-bidi: plaintext">He said '<span class="isolate">hello!</span>'.</div> Please note that, in fact, both Firefox and WebKit display the last example with the "hello!" paragraph in RTL, as inherited from the div's direction, not in LTR as demanded by the div's unicode-bidi:plaintext. This is especially significant given that, strictly speaking, according to the current definition unicode-bidi:plaintext, the div's unicode-bidi:plaintext *should* apply to the "hello!" paragraph, since the div is the containing block for it. This gives me an additional indication that the modified definition is, in fact, better than the current one. Now, let's go on to alignment. Here is the current definition: The start and end edges of line boxes are determined by the inline base direction per "paragraph", where in this case the "paragraph" is all consecutive line boxes not separated by a forced line break or block boundary. In most cases, this means referring to the ‘direction’ property of the containing block. In the case of ‘unicode-bidi: plaintext’, however, this uses the implied inline base direction of the "paragraph" (i.e. the based direction that is used for bidi reordering). There are a couple of issues here. - What exactly does "In the case of ‘unicode-bidi: plaintext’" mean? I think it means the case where the element whose text-align (or text-align-last) we are trying to apply also has unicode-bidi:plaintext. - The definition assumes that a "paragraph" can contain several line boxes, but not the other way around. This is not true. Take, for example, the following: <div dir=ltr>I said '<span style="unicode-bidi:plaintext">שלום!</span>' and he said '<span style="unicode-bidi:plaintext">hello!</span>'.</div> This contains three paragraphs: - "I said '*' and he said '*'." This is LTR because of the dir=ltr on the div. - "שלום!" This is RTL because of the unicode-bidi:plaintext on the first span. - "hello!" This is LTR because of the unicode-bidi:plaintext on the second span. All three paragraphs, however, are in a single line box. So, how should alignment work when a line box contains several paragraphs? I propose that just as an element's unicode-bidi:plaintext only affects the directionality of paragraphs that the element immediately contains, so too it is *only these* paragraphs' directionality that should affect the alignment of the line boxes in the element when the element is a containing block. That is, let's note that if two distinct paragraphs are immediately contained by the same element, they can not share a line box. (If they did, they would have to be a single paragraph.) Thus, each line box in a containing block belongs to exactly one of the block's immediately contained paragraphs. Now, the amended definition for alignment: The start and end edges of a line box are relative to a direction determined as follows. If a line box's containing block has unicode-bidi:plaintext, use the base directionality of the containing block's immediately contained paragraph to which the line box belongs. Otherwise, use the containing block's computed direction. Please note that this means that unicode-bidi:plaintext does not affect alignment except when applied to an element that is a containing block. Let's take an example: <div dir=rtl><div style="unicode-bidi:plaintext; text-align:start">He said '<span style="unicode-bidi:isolate">שלום!</span>'.</div></div> We have here a line box containing two paragraphs: - "שלום!" This is RTL by inheritance from the (outer) div. It is immediately contained by the span. Thus, given that the line box's containing block is the (inner) div, not the span, the line box does not belong to this paragraph. - "He said '*'." This is the paragraph immediately contained by the div with unicode-bidi:plaintext. The line box belongs to it. This paragraph is LTR by the Unicode heuristic. Note that it would have been LTR even if the span were moved to its very beginning, e.g. <div style="unicode-bidi:plaintext; text-align:start">'<span style="unicode-bidi:isolate">שלום!</span>', he said.</div>. Since the line box belongs to it, and it is LTR, the line box is aligned left. Now, another example: <div dir=rtl><div style="unicode-bidi:plaintext; text-align:start"><span style="unicode-bidi:isolate">שלום!</span></div></div> We still have here a line box containing two paragraphs: - "שלום!" As before, this is RTL by inheritance from the (outer) div. It is immediately contained by the span, and thus the line box does not belong to it. - "*" This is the paragraph immediately contained by the div with unicode-bidi:plaintext. The line box belongs to it. This paragraph is all-neutral, and thus LTR by the Unicode heuristic. Since the line box belongs to it, and it is LTR, the line box is aligned left. And finally, let's take an example like the one in the current spec. As you will see, my expectations differ from those currently stated in the spec: <div dir=ltr style="white-space: pre; text-align:start; unicode-bidi:plaintext"> He said: <span style="unicode-bidi:plaintext">שלום! How are you? להתראות.</span> </div> There are four paragraphs: - "He said: *" This is LTR due to the div's direction. It is immediately contained by the div. - "שלום!" This is RTL due to the Unicode heuristic on the span, which immediately contains it, so it will be displayed "!שלום" - "How are you?" This is LTR due to the Unicode heuristic on the span, which immediately contains it. - "להתראות." This is RTL due to the Unicode heuristic on the span, which immediately contains it, so it will be displayed ".להתראות" There are three line boxes. *All three belong to the "He said: *" paragraph.* That's because it is the only paragraph immediately contained by the div, the line boxes' containing block. Thus, all three are aligned left. If the example had omitted the span, the third line box would have been aligned right. Aharon On Fri, Feb 17, 2012 at 7:42 PM, fantasai <fantasai.lists@inkedblade.net>wrote: > On 10/30/2011 09:28 PM, Simon Montagu wrote: > >> As far as I can see, there is no explicit specification in CSS Writing >> Modes Module Level 3 of what effect "unicode-bidi: >> plaintext" should have on the default alignment of paragraphs. >> >> When implementing "unicode-bidi: plaintext" for Gecko, I took it for >> granted that each paragraph in the element would >> determine its directionality by the heuristic in the UBA, and then >> determine the start of the line box depending on the >> directionality of the paragraph. >> >> I just noticed that recent versions of Chrome behave differently: >> directionality is determined for each paragraph separately, >> but alignment is determined by the first paragraph in the element, and >> all subsequent paragraphs get the same alignment. >> >> As I said, there doesn't seem to be anything in the spec to say which >> approach is correct. I think the behaviour in Gecko is >> more intuitive and useful, but then I would, wouldn't I? Either way, it >> is probably worth adding something to the spec to make >> it explicit. >> > > Fixed in the spec, per Aharon's recommendation: > http://dev.w3.org/csswg/css3-**text/#text-align<http://dev.w3.org/csswg/css3-text/#text-align> > > # The start and end edges of line boxes are determined by the inline > # base direction per "paragraph", where in this case the "paragraph" > # is all consecutive line boxes not separated by a forced line break > # or block boundary. In most cases, this means referring to the > # ‘direction’ property of the containing block. In the case of > # ‘unicode-bidi: plaintext’, however, this uses the implied inline > # base direction of the "paragraph" (i.e. the based direction that > # is used for bidi reordering). > > And there's an example afterward. > > Simon, can you look this over and let me know if it matches your > implementation? > > ~fantasai > >
Received on Monday, 20 February 2012 12:59:21 UTC