- From: Aharon (Vladimir) Lanin <aharon@google.com>
- Date: Sun, 6 Jun 2010 18:50:01 +0300
- To: public-i18n-bidi@w3.org, Tex Texin <textexin@xencraft.com>, Craig Cummings <crc@yahoo-inc.com>, Norbert Lindenberg <norbert.lindenberg@yahoo-inc.com>, Roozbeh Pournader <roozbeh@gmail.com>, Xiaomei Ji <xji@google.com>, Matitiahu Allouche <matial@il.ibm.com>, fantasai <fantasai@inkedblade.net>, Tab Atkins <tabatkins@google.com>, "Tab Atkins Jr." <jackalmage@gmail.com>, Adil Allawi <adil@diwan.com>, Najib Tounsi <ntounsi@gmail.com>, Ehsan Akhgari <ehsan@mozilla.com>, Ehsan Akhgari <ehsan.akhgari@gmail.com>, Mark Davis <mark@macchiato.com>, Bob Jung <bjung@google.com>
- Message-ID: <AANLkTimvDwb5cJuht3gTCJjY2sYiRHXTjMxsOMn-vKEI@mail.gmail.com>
This is a somewhat long (but I hope also somewhat amusing) attempt to deal with a number of interrelated issues that have arisen in connection to the bdi (BiDirectional Isolate) attribute proposed by section 2.1 of "Additional Requirements for Bidi in HTML" (< http://www.w3.org/TR/html-bidi/#bidi-isolation>). (The name bdi is likely to be replaced with something more meaningful. This is a separate issue that I am ignoring here.) -- Recap The bdi attribute is currently defined as making an inline element directionally isolated from its surroundings by making it behave as if it were surrounded with strong-directional characters of the last explicit embedding level within which it appears. For example: <div dir=ltr> <span dir="rtl" bdi="yes">PURPLE PIZZA</span> - 3 reviews </div> would be displayed the same as <div dir=ltr> ‎<span dir="rtl">PURPLE PIZZA</span>‎ - 3 reviews </div> i.e. as AZZIP ELPRUP - 3 reviews and not as "3 - AZZIP ELPRUP reviews", which is the case currently, without bdi. The proposed definition also includes balancing any missing and extra PDF characters in the content. Now, the core issue: --- Separation and isolation The current definition directionally separates the bdi element not only from the text around it, but also the text before it from the text after it. The latter makes a difference when both the text before and the text after the bdi element have the opposite direction (implicit or explicit) than the last explicit embedding level - e.g. the direction of the parent element they all share - and neither has bdi. The bdi element prevents the text before and the text after it from combining into a single implicit directional phrase. In the past, Mati and I discussed, and fantasai later independently suggested, an alternate definition that avoids directionally isolating the text before the bdi from the text after it. Instead of surrounding the bdi element with imaginary strong-dimensional characters, it puts the text inside the element in a separate bidi paragraph, thus isolating it from its surroundings, and then treats the whole bdi element as a neutral character from the point of view of the surrounding text. This (or something very close to it) is what already happens today in all major browsers with an <input> element: its value is displayed unaffected by the text around it, and the <input> is treated by its surrounding text as if it were neutral. Thus, "MAN <input value='bites'> CAT" is displayed as "TAC [bites ] NAM" whether it is in a <div dir=ltr> or a <div dir=rtl>, i.e. the "MAN" is allowed to stick to the "CAT" right across the <input>. Under the new definition of bdi, "MAN <span dir=ltr bdi>bites</span> CAT" would also be displayed as "TAC bites NAM" whether in a <div dir=ltr> or in a <div dir=rtl>. Under the original one, it would come out as "NAM bites TAC". I would like to label the new definition as *isolation* and the original definition as *separation*. --- Current browser usage It turns out that <input> and <textarea> elements are not the only place where the latest versions of all major browsers do isolation, or something very much like it. (I have tested Firefox 3.6.3, IE 8, Safari 4.0.5, Chrome 5.0, and Opera 10.53) There are also the following: * style that takes the element out of the flow, e.g. float:left|right and position:absolute. * display:inline-block Firefox and Opera also treat block elements generally (display:block elements, to be precise) with isolation. However, Mozilla engineers have already agreed in this forum that this is a bug. A full UBA paragraph break as in the other browsers is the correct way to go. Separation, on the other hand, has almost no precedent in today's browsers. The only exception is an embedded block element with display:inline, which until recently all browsers treated as a normal inline element, with no separation or isolation. Firefox recently discovered that this is not according to spec, and changed it to use separation. I am not sure if using separation vs isolation was a conscious choice or what considerations went into it. *** Mozilla engineers: could you shed some light? *** It is quite clear that isolation is a very natural choice for <input>, <textarea>, and floating and absolutely-positioned elements. Nothing else, including separation, makes much sense for them. But then again, they are a different phenomenon than the bdi attribute: *we would not want to allow the user to turn isolation off for them with bdi=no.* With display:inline-block text appearing in a separate block box, it probably does not make much sense to allow the user to turn isolation off for it either. And the theoretical question of whether separation might work somewhat better than isolation for it is moot in the absence of a clear need for change: we would not want to disturb backward compatibility (or the current browser interoperability). --- Should bdi do isolation? Thus, the question arises: given the browsers’ current preference for isolation, perhaps we should use isolation for bdi too? Some reasons for and against that are pretty obvious: * Pro: isolation is a simpler, more intuitive, and more easily stated definition than separation. * Pro: isolation for bdi would make its behavior consistent with the cases where the browser already uses isolation, making browser behavior that much easier to understand and predict. * Pro: isolation avoids the possibly difficult-to-implement case of a bdi element coming between an LRE/RLE/LRO/RLO and its matching PDF. * Con: if bdi does not do separation, section 2.1 of the proposal (the <br> conundrum) no longer works. * Con: isolation does not have a plain-text equivalent in the Unicode Bidi Algorithm. You can't "isolate" a string using Unicode formatting characters. * Con: as a result of the preceding item, isolation is probably significantly harder to implement, and may carry significantly more processing overhead when dealing with ordinary inline text. ***Reality check***: can anyone who has actually implemented bidi in browser text processing weigh in on the extent to which the last item is true? Of course, we could even allow the author to choose between separation and isolation, by changing bdi’s value repertoire to something like none|bdi|isolate|separate, where bdi would be a synonym for "separate". We could even throw in something like "paragraph" or "break", to indicate a full UBA paragraph break. The differences between their behavior are so fine, however, that this does not seem very desirable. Let us examine the effect of those fine differences on the way bdi would work in text, trying to find any arguments for one or the other. --- Isolation vs separation in text Let's start with <div dir=ltr> read "DEAR <span dir=ltr bdi>john</span> AND SUSAN" today! </div> Under separation it would be displayed as: read "RAED john NASUS DNA" today! This is not very good. Under isolation, however, we would get: read "NASUS DNA john RAED" today! This certainly makes more sense. And the effect does not depend on the bdi attribute being combined with dir. Under separation, <div dir=ltr> read "DEAR <span bdi>JOHN</span> AND SUSAN" today! </div> would be displayed as read "RAED NHOJ NASUS DNA" today! This is even worse than before, but isolation would still fix it. So, do we have an argument for isolation? Not necessarily. Opposite-direction phrases, such as the whole "DEAR ... SUSAN" quote in our examples above, should really be surrounded in an element that declares their direction. If they are not, as our quote isn't, they are often displayed garbled. The garbling is most severe when they contain opposite-direction inserts like our "john" above. If the whole "DEAR ... SUSAN" quote were surrounded in a <span dir=rtl>, as it should be, it would of course come out as intended, with or without the bdi on "john" or "JOHN". Also, when an opposite-direction phrase contains arbitrary-direction bdi inserts, we are talking about more than one level of logical embedding. This is not a very common occurrence – as the rather forced character of our example above testifies. And it is certainly possible to make up other examples where one *does* want to separate the text before the bdi element from the text following it, e.g.: <div dir=ltr> i spoke to JOHN. <span dir=rtl bdi>SUSAN</span>, MIKE and ollie spoke to him too. </div> Under separation, this comes out as: i spoke to NHOJ. NASUS, EKIM and ollie spoke to him too. This seems as good as it's going to get. Under isolation, one the other hand, we get the very misleading i spoke to EKIM ,NASUS .NHOJ and ollie spoke to him too. It would be even more misleading if instead of "<span dir=rtl bdi>SUSAN</span>", we had "<span dir=ltr bdi>susan</span>". Under isolation, it would come out as: i spoke to EKIM ,susan .NHOJ and ollie spoke to him too. So, do we have a strong argument for separation? No, it is also flimsy. "JOHN" and "MIKE" in our example need bdi no less than "SUSAN". Without it, we can expect them to garble their surroundings. With it, the example comes out as intended, whether we use separation or isolation. Why would the author use bdi on one insert, but not the other two? Also, having the coincidence of separate opposite-direction phrases around the bdi element does not seem like a common occurrence either. Nevertheless, this argument is perhaps less flimsy than the one for isolation above. In web apps, different parts of the document are produced by different layers of code. One layer may be using bdi; another might not. In fact, the layer not using <bdi> may not be easily capable of doing so, perhaps because it is limited to plain text. It is worthwhile pointing out that there is no third kind of case. For isolation and separation to differ, one needs opposite-direction text on both sides of the bdi. It either makes up one logical phrase, or it doesn't; there is no third choice. If anything, we seem to have a weak argument for separation over isolation. --- bdi by default A number of cases have been proposed where an element should have bdi *by default*, usually since it makes no sense to let it affect and be affected by what surrounds it. Here is a list: * dir=auto elements (section 2.2) * <a> (najib) * <br> (section 3.1) * block elements with inline display (fantasai and others) * display:inline-block elements (section 3.3) Would these work better with "separate" or "isolate"? * dir=auto: weak preference for separation The dir=auto case is no different than the explicit <span dir=ltr|rtl bdi> cases considered above, where we have identified a weak preference for separation. * <a>: do not change current behavior As with the explicit <span dir=ltr|rtl bdi> cases considered above, implicit bdi on <a> would work better under isolation when the <a> is in the middle of a coherent, undeclared opposite-direction phrase, but better under separation when the <a> is both preceded and followed by opposite-direction text that does not form a single logical phrase and does not use bdi on either side. I my opinion, however, such considerations are moot, since <a> should *not* become bdi by default. There is simply too much danger that it will break existing documents. This will happen whenever the link is part of an undeclared opposite-direction phrase, that either begins or ends with it, e.g.: <div dir=rtl>"click <a>here<a>" IS NOT THE WAY TO DO LINKS.</div> Currently, it is displayed as intended: SKNIL OD OT YAW EHT TON SI "click *here*" With bdi turned on for <a> by default, however, it would come out garbled regardless of whether we use isolation or separation: SKNIL OD OT YAW EHT TON SI "*here* click" One could try to argue that turning bdi on for default for <a> could also fix some current documents, but it does not seem likely: if the document is currently displayed garbled, the author would probably fix it. (This does not apply to another case where we did suggest changing current behavior, i.e. applying the direction of the parent of the <title> element to it, because few authors have any idea how to fix its display and the current behavior is unreliable anyway.) * <br>: separation As stated in 3.1, we want <br> to offer directional separation by default, while allowing for an option to disable it. The proposed solution is to make it bdi by default - but this only works if bdi uses separation, not isolation. However, in the absence of separation, can we deal with <br> by defining it as a full UBA paragraph break? After all, in practice, there is very little difference between separation and a full UBA paragraph break. (One difference is that separation does not terminate the effects of LRE, RLE, LRO, and RLO. However, the use of these characters is discouraged wherever mark-up can be used. Another difference is that for an inline element like <br> that can appear nested in any number of other inline elements, each with its own dir, separation is much easier to define and implement than a full UBA paragraph break. However, this does not seem to be a big consideration either, since a reasonable implementation for a UBA paragraph break inside an inline element has been described in 3.1, and seems to be the right thing to do for <br> in <pre> anyway, and there is no avoiding using it for inline elements with display:block, as mentioned in 3.3.) Unfortunately, defining <br> as a UBA paragraph break goes against past W3C and Unicode Consortium decisions. Worse, it would not allow a way to get a non-separating <br> when necessary. Another possibility is to add another element like <br>, with one including a UBA paragraph break, and the other behaving according to the current spec. Clearly, this is not a very attractive solution either... Thus, for <br>, we really do want bdi to do separation. * block elements with inline display: separation First a bit of history. The HTML 4 spec says ( http://www.w3.org/TR/REC-html40/struct/dirlang.html#style-bidi): "When a block element that does not have a dir attribute is transformed to the style of an inline element by a style sheet, the resulting presentation should be equivalent, in terms of bidirectional formatting, to the formatting obtained by explicitly adding a dir attribute (assigned the inherited value) to the transformed element." Currently, the only browser that implements this is Firefox. The others treat it as any other display:inline element, with no separation or isolation. Clearly, then, it would be beneficial to have its behavior be subject to bdi, since both separation and no separation are viable behaviors. As pointed out by Martin Dürst, the reason the spec was formulated that way is that there are block element that provide handy formatting that is not available in any inline element. Sometimes, however, the author might want that effect on a single line. An example would be to use <ol style="display:inline"> to get an inline numbered list: "1) apple 2) orange 3) pear". To get that effect in the presence of some opposite-direction text, one needs the same bidi behavior as one had for the block without display:inline, or as close as possible to it. The current definition attempts to do that, but clearly does not go far enough. The original behavior for a block element is full UBA paragraph breaks. Separation is clearly much closer to that than isolation. And Firefox does use it – even though that is not currently in accordance with the spec. Backward compatibility is not an issue because no current browser behavior matches the spec anyway. We therefore conclude that defining bdi to use separation is preferable for display:inline block elements, for which bdi would be on by default. * display:inline-block elements: do not change current behavior Unaware that display:inline-block already uses isolation in all the browsers, section 3.3 of the proposal suggested making it bdi by default. As noted above, changing it to use isolation would be problematic because it would break backward compatibility (and, at least for the short term, browser interoperability). However, since display:inline-block text appears in a separate block box, it probably does not make sense to allow the user to turn isolation completely off for it anyway, so we do not really want to make it subject to bdi. Thus, bdi can be explicitly define to apply exclusively to display:inline elements (and perhaps to display:runin elements when they behave like display:inline ones). Thus, with <br> and <display:inline> on block elements strongly arguing for separation, and with no strong usage argument for isolation, it is my opinion that bdi should use separation. --- The fallout In light of the above, I would like to suggest the following open issue resolutions: 2.1.c: No change. The definition of bdi=yes|bdi will stay as specified in the proposal. 2.1.d: No change. Neither <a> nor any other elements in addition to the <br> specified in the proposal will have bdi=yes by default. 3.3.a: 1. Inline elements with display:block style will be treated like ordinary block elements, i.e. serve a UBA paragraph breaks between the text preceding and following them. 2. Block elements with display:inline style will have bdi=yes by default. 3.3.b: The bdi attribute will have no effect on elements with display that is neither inline nor runin acting as inline. 3.3.c: The bdi attribute will have no effect on elements with float:left|right. They will continue to be treated as separate UBA paragraphs removed from their context, as they are today. 3.3.d: The bdi attribute will have no effect on elements with position:absolute. They will continue to be treated as separate UBA paragraphs removed from their context, as they are today. Aharon
Received on Sunday, 6 June 2010 15:51:26 UTC