per-paragraph auto-direction, a.k.a. dir=uba (was: meeting results - f2f on additional requirements for bidi in HTML)

I am having some very strong doubts about the per-paragraph auto-direction
(i.e. dir=uba) feature proposed by the f2f meeting.

One of the reasons for my doubts is that it makes direction an extremely
complicated concept. For example, as you probably know, properties like
margin-start and margin-end are in the process of being added to CSS3. Their
effect depends on the current CSS direction. What will happen when you
combine them with dir=uba? Take the following example:

<pre dir=uba>
an ltr line (and thus uba paragraph).<input type=button value="b1"
style="margin-start:20px" /><input type=button value="b2"
style="margin-start:20px" />
AN RTL LINE (AND THUS UBA PARAGRAPH).<input type=button value="B3"
style="margin-start:20px" /><input type=button value="B4"
style="margin-start:20px" />
</pre>

The point is that according to the dir=uba design, B3 and B4 inherit the
resolved direction of their parent, the <pre dir=uba> element. Since the
latter's direction is determined by its first UBA paragraph's content, i.e.
its first line, it has direction:ltr, and thus B3 and B4 also have
direction:ltr. Will their margins, however, be on their left or their right?
If it's on the left, as per their CSS direction, they will *not* do what the
author intends, which is to put a bit of space between the button and what
precedes it. And if it's on the right, it makes a laughing stock of these
elements' CSS direction.

This is just one example to show just how complicated this feature is. I
think it will be quite difficult to grasp - for the HTML & CSS WGs, for
browser implementors, and for HTML authors.

And it is not clear to me that the use cases are strong enough to support
such a complicated feature:

- Is there a large pool of pre-existing multi-paragraph plain data that
needs dir=uba to be displayed correctly? To exist, it must have been created
in an editor that displays each paragraph in the direction of its first
string character. However, no Microsoft product of which I am aware supports
this feature. The only mass-market editors I know that do support it are
gedit and some Mac editors. These products are not all that well-known among
RTL users. Thus, I tend to doubt that such a large pool of data already
exists.

- Where such a pool of data does exist, it is easy enough to preprocess it
for display in <pre> by dividing into paragraphs at each B-class character
and wrapping each paragraph in a <div dir=auto
autodirmethod="first-strong">.

- While bidi users indeed often do want to enter a multi-paragraph piece of
text where some paragraphs are LTR and some RTL, sites can already address
this by using a rich text editor widget for such data instead of a
<textarea>. For example, TinyMCE
(http://tinymce.moxiecode.com/)<http://tinymce.moxiecode.com/>is an
excellent rich-text editor control available for free that allows the
user to explicitly control the direction of each paragraph. <textarea> is
not exactly a super-cool feature.

When I mentioned dir=uba as part of a report on our f2f at a meeting of the
Standards Institute of Israel committee that deals with bidi, I immediately
got some of the feedback above from two of the people present. And these are
people who very much want bidi improvements and are quite enthusiastic about
the proposal. (I was already having doubts by that time, but I was very
careful not to give any hint of their existence at that meeting).

To summarize, I think that the feature is too complicated to justify in the
absence of overwhelming need. I think that we simply got carried away at the
f2f, and proposing the feature to broader circles will harm the overall
proposal's chances for acceptance and implementation.

Aharon

On Wed, Aug 18, 2010 at 7:25 PM, Aharon (Vladimir) Lanin
<aharon@google.com>wrote:

> The following are the resolutions reached by the face-to-face meeting on
> Additional Requirements for Bidi in HTML (http://www.w3.org/TR/html-bidi/),
> which took place on June 7-9 in Mountain View, California (and by
> teleconference).
>
> The meeting’s discussions covered most sections of the proposal, and the
> items below are conclusions that reached consensus during the meeting.
>
> All new names introduced below are tentative and subject to review by the
> relevant W3C working groups. However, where highly abbreviated names are
> suggested, their conciseness should be preserved.
>
> The meeting was attended by: Adil Allawi, Aharon Lanin, Behdad Esfahbod,
> Bob Jung, Craig Cummings, Ehsan Akhgari, Fantasai, Mark Davis, Matitiahu
> Allouche, Najib Tounsi, Norbert Lindenberg, Roozbeh Pournader, Tab Atkins,
> and Xiaomei Ji.
>
>
> --- bidi isolation ---
> (Section 2.1, except as indicated otherwise below)
>
> 1. Rename the bdi attribute to ubi (Unicode Bidi Isolate)
>
> 2. ubi syntax is ubi=”ubi”|””|”off”. The “ubi” and empty string values are
> equivalent, and mean that bidi isolation is on for the element.
>
> 3. (Sections 2.1, 3.3) ubi has an effect on all and only elements that are
> rendered as CSS non-replaced inline boxes. Thus, for example.:
> a. ubi will be ignored by any elements that are not display:inline (or
> display:runin when it behaves as display:inline). This includes
> display:inline-block elements (which should continue to use bidi isolation,
> as already stated in the spec, regardless of their ubi attribute value) and
> normally inline elements whose display has been set to something other than
> inline.
> b. block elements whose display has been set to inline will be subject to
> ubi.
> c. ubi will be ignored by floating and position:absolute (and fixed)
> elements, even though they may have display:inline.
>
> 4. Change the definition of ubi to use “isolation”, as opposed to
> “separation”, i.e.: The content of an element with ubi on will appear in the
> same location and have the same effect on the bidi ordering around it as a
> single neutral character (bidi class ON). The bidi ordering within the
> element is determined by treating its contents as an independent UBA
> paragraph or sequence of paragraphs, with the element’s computed direction
> as their base direction.
>
> 5. (Sections 2.1, 2.2, 3.1) The default value for ubi is:
> a. “ubi” (i.e. on) for elements where dir=auto
> b. “ubi” (i.e. on) for block elements with display:inline
> c. “off” in all other cases.
> Earlier suggestions to turn ubi on by default for <br>, <a>, and
> display:inline-block have been rejected.
>
> 6. The CSS equivalent of ubi is unicode-bidi:isolate. Thus, it does not
> inherit (neither in CSS nor in HTML). Please note that the “isolate” value
> can be combined with “bidi-override”, which is what would have to happen for
> <bdo dir=ltr|rtl ubi>. [Editor’s note: we should say something about
> “isolate” taking precedence over other unicode-bidi values. e.g. “embed” and
> “normal”.]
>
> 7. Once any browser implements ubi, add a W3C best practice for authors to
> use ubi on <a>.
>
> 8. We have discussed but not reached a conclusion for the following
> suggestion: When translating HTML to plain text, e.g. for copy/paste, the
> result should contain the appropriate existing Unicode directional
> formatting codes so that the text is displayed in the same visual order (by
> UBA-compliant software) as the HTML, while retaining the text’s logical
> order. This should be taken up in an e-mail thread.
>
>
> --- line breaks as UBA paragraph breaks ---
> (Sections 3.1, 3.2, and 3.3, as indicated below)
>
> 9. (Section 3.1) Add a new HTML attribute that affects the behavior of all
> and only descendant <br> elements:
> a. Tentative syntax for the attribute: bidibreak=”soft”|”hard”. The “soft”
> value means to treat the <br> as the UBA bidi class WS (as explicitly
> required in HTML 4). The “hard” value means to treat it as B.
> b. The default value is “hard”.
> c. Thus, to get behavior in mark-up like that of U+2028 in plain text, use
> <br bidibreak=soft>. Since the attribute inherits, it could also be
> specified on an ancestor element, e.g. for poetry, or on the root element
> for documents that rely on the bidi behavior specified for <br> by HTML 4.
> d. bidibreak does not have a CSS equivalent.
>
> 10. (Section 3.2) All non-collapsed newlines, e.g. in <pre> and <textarea>,
> are to be treated as UBA paragraph breaks, regardless of the value of
> bidibreak.
>
> 11. (New section) HTML5 and CSS2.1 should clarify that U+2028 and U+2029 in
> <pre> and <textarea> should behave as they do in plain text.
>
> 12. (Section 3.3) Out-of-flow elements, e.g. floating or position:absolute
> ones, do not have any effect on surrounding content, e.g. they do not
> introduce a UBA paragraph break even if they do have display:block.
>
>
> --- auto-direction ---
> (Section 2.2)
>
> 13. dir=“auto” sets the CSS direction property to either “ltr” or “rtl”.
> There will be no such thing as “direction:auto” in CSS.
>
>
> --- “formatting” auto-direction ---
> (Section 2.2)
>
> 14. We will not consider at this time adding a dir value that (assuming
> standard existing UBA treatment of the text) can only be implemented by
> inserting directional formatting codes into the text.
>
>
> --- word-count auto-direction ---
> (Section 2.2)
>
> 15. It seems unlikely that a language-unaware direction estimation
> algorithm based on counting LTR and RTL words can be uniformly successful
> across different languages, because:
> a. Different languages are likely to use different numbers of words to
> express the same concept. German, for example, is well-known to often use a
> long compound word where English would use two or three separate words.
> b. The proposal’s suggestion to use line-break opportunities as word
> boundaries in order to deal with languages such as Chinese, Japanese, and
> Korean, which do not use spaces between words, does not seem likely to work
> well for this purpose. In most cases, what would be considered a word in
> Chinese consists of two or three characters, but line breaks are allowed
> between them. Thus, word counts are likely to be highly inflated for CJK
> text if based on line break opportunities. True word counts for such
> languages may require dictionary look-up, which is prohibitively expensive
> for the purpose of direction estimation.
>
> 16. A character-count-based direction estimation algorithm, with different
> coefficients for characters from different scripts, seems likely to give
> results as good or better than the word-count-based algorithm, while being
> significantly easier to implement.
>
> 17. Efficiency is likely to become problematic for count-based direction
> estimation unless a limit is placed on the length of text examined.
>
> 18. Progress on relative-count-based direction estimation will require
> research that compares the results of various algorithms (and coefficients
> used by the algorithms) on actual text samples of known author-assigned
> overall direction.
>
>
> --- per-paragraph auto-direction ---
> (Section 2.2)
>
> 19. In plain text, the UBA supports per-paragraph auto-direction: unless a
> base direction is specified externally, the base direction of each UBA
> paragraph is assigned based on that paragraph’s content (namely its first
> character with strong direction) independently of the others. There exist
> text editors that support this feature (e.g. gedit). It would be desirable
> to add such support to HTML as well. For example, there should be an easy
> way to enter text in a <textarea> and then display it in a <pre> using UBA’s
> per-paragraph’s auto-direction in both cases. The following is an attempt to
> design such a dir=uba feature, in addition to the dir=auto already proposed.
>
> 20. The values for dir will also include “normal”, “auto”, and “uba”, and
> the values for unicode-bidi will also include “uba”. [Editor’s note:
> subsequent to the meeting, several of the attendees expressed serious
> reservations about the complexity of the design below.]
> a. The default dir for all elements is “normal”, with the exception of
> block elements whose parent’s dir is “uba”. These inherit “uba”.
> b. Elements with dir=normal have the same resolved direction (both the
> internal HTML “property” used for CSS purposes and the actual CSS property)
> as the parent element. It also sets the unicode-bidi CSS property to normal
> (unless ubi is explicitly on for that element). The primary purpose for
> explicitly stating dir=“normal” is to break dir=“uba” inheritance from the
> parent.
> c. dir=“uba” sets the resolved direction (as defined above) of the element
> according to the UBA applied to its textual content. The textual content is
> the in-order traversal of all text nodes (even if they have an explicit
> dir).
> d. In the application of the UBA to textual content, if the text contains
> no characters of the bidi classes L, AL, or R, the resolved direction of the
> text is inherited.
> e. dir=“uba” sets the unicode-bidi CSS property to “uba”.
> f. The base directionality of a UBA paragraph (which is distinct from CSS
> direction, which it does not have) whose containing block element has
> unicode-bidi:uba is set according to the paragraph’s content using the UBA.
> A UBA paragraph’s lines’ alignment is determined by the paragraph’s base
> directionality when the text-align of the containing block element is start
> or end.
> g. To clarify, when an inline element has dir=“uba”, its children do not
> inherit dir=“uba”, but do inherit the resolved direction of the inline
> element.
> h. dir=“uba” implies ubi by default. If ubi is explicitly off on this
> element, the unicode-bidi value is “uba embed”. Otherwise, unicode-bidi is
> “uba isolate”.
> i. TBD: what happens in <textarea> when the user sets an explicit direction
> via the browser UI, for all dir values.
>
>
> --- directional images ---
> (Section 2.4)
>
> 21. The proposed feature of horizontal flipping of images based on
> direction may not be quite as useful as envisioned because some and perhaps
> even the majority of images that need modifications for the
> opposite-direction UI require modifications more complicated than a simple
> horizontal flip. (For example, just part of the image may need flipping.) If
> one needs two different image versions for a significant fraction of the
> images anyway, one comes up with machinery to deal with that, and there is
> little additional cost to have that machinery also deal with the icons that
> are amenable to simple flipping. Nevertheless, we estimate that there still
> will be cases where such a feature will be genuinely helpful.
>
> 22. The proposed feature of horizontal flipping of images based on
> direction can also be achieved on the element level by the directional
> selection (:rtl) and graphic transformation features (transform:scaleX(-1))
> already proposed for CSS3. There does not appear to be a sufficient need for
> it on the HTML level. On the CSS level, however, where an image such as a
> background may be specified and may need to be flipped without flipping the
> whole element, such a need does exist.
>
> 23. The other proposed feature of direction-based choice between two images
> specified by two separate urls does not seem very appropriate for HTML,
> since the two images are likely to have almost the same URL, differing only
> in one of the folder names or a part of the file name. Repeating the longer,
> consistent parts of the two URLs would be poor coding practice for HTML,
> considering that the alternative of replacing just the variable part of the
> URL is easily achieved in the code generating the HTML. The same does not
> apply to CSS, which should preferably be static.
>
> 24. Thus, instead of the proposed HTML changes, we should consider adding
> an rtlflip option to the image notation in CSS3 Images.
>
>
> --- base direction of dialog text ---
> (Section 3.4, except as indicated otherwise below)
>
> 25. Approach ECMAScript people, recommending optional explicit direction
> parameters for alert(), confirm(), and prompt().
>
> 26. In the absence of direction passed in via an explicit parameter, dialog
> text (e.g. text displayed using the ECMAScript functions above) should be
> broken up into paragraphs, and the direction of each paragraph be
> automatically estimated and applied in the paragraph’s display. The text is
> broken into paragraphs at characters of bidi class B, e.g. newline.
> [Editor’s note: what is the estimation algorithm to be used?]
>
> 27. (New section) User agents must implement the Unicode spec re Default
> Ignorable Code Points (Unicode Standard version 5.2, Chapter 5, section
> 5.21), including never displaying the LRM, RLM, LRE, RLE, LRO, RLO, and PDF
> characters inappropriately (e.g. as empty boxes or advance widths) even if
> the underlying platform does not handle them properly. In particular, this
> must be the case for script dialog text, page titles, and tooltips.
>
>
> --- events on user setting text direction ---
> (Section 3.8)
>
> 28. There is no need to trigger the oninput event when the user explicitly
> sets the direction of an <input> or <textarea> element since the dir
> attribute change that this causes should generate the DOM2 DOMAttrModified
> event (a MutationEvent).
>
>
> --- list marker direction ---
> (Section 3.10)
>
> 29. Currently, all browsers render a list item’s marker on the start side
> of the list item, even when the list item’s direction differs from the
> list’s direction. Since the list item markers appear in the margin or
> padding, the list element automatically sets up a margin on its start side
> so that the markers have somewhere to appear. However, the list does not set
> up a margin on its end side, and so the opposite-direction markers get cut
> off by default. It would be a bad idea to fix this by having the list
> automatically leave a margin on the end side because this would waste screen
> real estate in the usual case where there are no opposite-direction list
> items.
>
> 30. Since there does not seem to be a way to fix the default display of
> opposite-direction list item markers on the end side of the list, and since
> in many or most cases the preferred display of opposite-direction list items
> is with the marker on the start side for the list, not the list item, it
> seems advisable to make opposite-direction list items’ markers occur on the
> start side of the list by default. Nevertheless, since in some cases the
> preferred display may be on the start side of the item, this should be made
> configurable. The right place for such a configuration is CSS.
>
> 31. CSS3 will include a new property, list-style-direction, with the values
> “left”, “right”, “start”, and “match-me”. (The last is a placeholder name
> until we find something better.)
> a. The “start” value means according to the list item’s direction.
> b. The match-me value is like start, but is inherited as a computed value
> of either left or right.
> c. The CSS initial value will be “start”. However, to get markers to appear
> all on one side in most cases, the default style sheet will specify
> ":not(li) > ol, :not(li) > ul { list-style-direction:match-me;}". (The
> reason we can't change the CSS initial value is because list-style-direction
> is effectively 'start' according to CSS2.1, and this default behavior cannot
> be changed later. CSS2.1 will not change because there are use cases for the
> current behavior and we already have interop on it.)
> [Editor’s note: “left”, “right”, and “start” seem to be alignment values,
> not direction values. We are trying to deal with marker direction, which
> affects not only where the marker is going to be displayed, but the way the
> marker’s text will be displayed (e.g. where the period of an ordered marker
> goes). It therefore seems that this section needs to be redesigned. Perhaps
> the values should be simply “like-list” and “like-item”, with inheritance.]
>
> 32. When one does want the opposite-direction list item markers to appear
> on the list items’ start sides, one will need to set up margins or padding
> appropriately in addition to setting list-style-direction.
>
> 33. None of this has any effect on the default alignment of list items,
> which will remain at the start side of their own direction. The user will
> have to explicitly use li {text-align:match-parent} to change that. We can
> not make this the default without breaking the inheritance of text-align.
>
>

Received on Monday, 23 August 2010 15:10:00 UTC