W3C home > Mailing lists > Public > public-i18n-bidi@w3.org > July to September 2010

meeting results - f2f on additional requirements for bidi in HTML

From: Aharon (Vladimir) Lanin <aharon@google.com>
Date: Wed, 18 Aug 2010 19:25:53 +0300
Message-ID: <AANLkTik=56LxN3S0qenYcxfUKEd38ki+xNQa-J0CY=8S@mail.gmail.com>
To: public-i18n-bidi@w3.org
The following are the resolutions reached by the face-to-face meeting on
Additional Requirements for Bidi in HTML (http://www.w3.org/TR/html-bidi/),
which took place on June 7-9 in Mountain View, California (and by
teleconference).

The meeting’s discussions covered most sections of the proposal, and the
items below are conclusions that reached consensus during the meeting.

All new names introduced below are tentative and subject to review by the
relevant W3C working groups. However, where highly abbreviated names are
suggested, their conciseness should be preserved.

The meeting was attended by: Adil Allawi, Aharon Lanin, Behdad Esfahbod, Bob
Jung, Craig Cummings, Ehsan Akhgari, Fantasai, Mark Davis, Matitiahu
Allouche, Najib Tounsi, Norbert Lindenberg, Roozbeh Pournader, Tab Atkins,
and Xiaomei Ji.


--- bidi isolation ---
(Section 2.1, except as indicated otherwise below)

1. Rename the bdi attribute to ubi (Unicode Bidi Isolate)

2. ubi syntax is ubi=”ubi”|””|”off”. The “ubi” and empty string values are
equivalent, and mean that bidi isolation is on for the element.

3. (Sections 2.1, 3.3) ubi has an effect on all and only elements that are
rendered as CSS non-replaced inline boxes. Thus, for example.:
a. ubi will be ignored by any elements that are not display:inline (or
display:runin when it behaves as display:inline). This includes
display:inline-block elements (which should continue to use bidi isolation,
as already stated in the spec, regardless of their ubi attribute value) and
normally inline elements whose display has been set to something other than
inline.
b. block elements whose display has been set to inline will be subject to
ubi.
c. ubi will be ignored by floating and position:absolute (and fixed)
elements, even though they may have display:inline.

4. Change the definition of ubi to use “isolation”, as opposed to
“separation”, i.e.: The content of an element with ubi on will appear in the
same location and have the same effect on the bidi ordering around it as a
single neutral character (bidi class ON). The bidi ordering within the
element is determined by treating its contents as an independent UBA
paragraph or sequence of paragraphs, with the element’s computed direction
as their base direction.

5. (Sections 2.1, 2.2, 3.1) The default value for ubi is:
a. “ubi” (i.e. on) for elements where dir=auto
b. “ubi” (i.e. on) for block elements with display:inline
c. “off” in all other cases.
Earlier suggestions to turn ubi on by default for <br>, <a>, and
display:inline-block have been rejected.

6. The CSS equivalent of ubi is unicode-bidi:isolate. Thus, it does not
inherit (neither in CSS nor in HTML). Please note that the “isolate” value
can be combined with “bidi-override”, which is what would have to happen for
<bdo dir=ltr|rtl ubi>. [Editor’s note: we should say something about
“isolate” taking precedence over other unicode-bidi values. e.g. “embed” and
“normal”.]

7. Once any browser implements ubi, add a W3C best practice for authors to
use ubi on <a>.

8. We have discussed but not reached a conclusion for the following
suggestion: When translating HTML to plain text, e.g. for copy/paste, the
result should contain the appropriate existing Unicode directional
formatting codes so that the text is displayed in the same visual order (by
UBA-compliant software) as the HTML, while retaining the text’s logical
order. This should be taken up in an e-mail thread.


--- line breaks as UBA paragraph breaks ---
(Sections 3.1, 3.2, and 3.3, as indicated below)

9. (Section 3.1) Add a new HTML attribute that affects the behavior of all
and only descendant <br> elements:
a. Tentative syntax for the attribute: bidibreak=”soft”|”hard”. The “soft”
value means to treat the <br> as the UBA bidi class WS (as explicitly
required in HTML 4). The “hard” value means to treat it as B.
b. The default value is “hard”.
c. Thus, to get behavior in mark-up like that of U+2028 in plain text, use
<br bidibreak=soft>. Since the attribute inherits, it could also be
specified on an ancestor element, e.g. for poetry, or on the root element
for documents that rely on the bidi behavior specified for <br> by HTML 4.
d. bidibreak does not have a CSS equivalent.

10. (Section 3.2) All non-collapsed newlines, e.g. in <pre> and <textarea>,
are to be treated as UBA paragraph breaks, regardless of the value of
bidibreak.

11. (New section) HTML5 and CSS2.1 should clarify that U+2028 and U+2029 in
<pre> and <textarea> should behave as they do in plain text.

12. (Section 3.3) Out-of-flow elements, e.g. floating or position:absolute
ones, do not have any effect on surrounding content, e.g. they do not
introduce a UBA paragraph break even if they do have display:block.


--- auto-direction ---
(Section 2.2)

13. dir=“auto” sets the CSS direction property to either “ltr” or “rtl”.
There will be no such thing as “direction:auto” in CSS.


--- “formatting” auto-direction ---
(Section 2.2)

14. We will not consider at this time adding a dir value that (assuming
standard existing UBA treatment of the text) can only be implemented by
inserting directional formatting codes into the text.


--- word-count auto-direction ---
(Section 2.2)

15. It seems unlikely that a language-unaware direction estimation algorithm
based on counting LTR and RTL words can be uniformly successful across
different languages, because:
a. Different languages are likely to use different numbers of words to
express the same concept. German, for example, is well-known to often use a
long compound word where English would use two or three separate words.
b. The proposal’s suggestion to use line-break opportunities as word
boundaries in order to deal with languages such as Chinese, Japanese, and
Korean, which do not use spaces between words, does not seem likely to work
well for this purpose. In most cases, what would be considered a word in
Chinese consists of two or three characters, but line breaks are allowed
between them. Thus, word counts are likely to be highly inflated for CJK
text if based on line break opportunities. True word counts for such
languages may require dictionary look-up, which is prohibitively expensive
for the purpose of direction estimation.

16. A character-count-based direction estimation algorithm, with different
coefficients for characters from different scripts, seems likely to give
results as good or better than the word-count-based algorithm, while being
significantly easier to implement.

17. Efficiency is likely to become problematic for count-based direction
estimation unless a limit is placed on the length of text examined.

18. Progress on relative-count-based direction estimation will require
research that compares the results of various algorithms (and coefficients
used by the algorithms) on actual text samples of known author-assigned
overall direction.


--- per-paragraph auto-direction ---
(Section 2.2)

19. In plain text, the UBA supports per-paragraph auto-direction: unless a
base direction is specified externally, the base direction of each UBA
paragraph is assigned based on that paragraph’s content (namely its first
character with strong direction) independently of the others. There exist
text editors that support this feature (e.g. gedit). It would be desirable
to add such support to HTML as well. For example, there should be an easy
way to enter text in a <textarea> and then display it in a <pre> using UBA’s
per-paragraph’s auto-direction in both cases. The following is an attempt to
design such a dir=uba feature, in addition to the dir=auto already proposed.

20. The values for dir will also include “normal”, “auto”, and “uba”, and
the values for unicode-bidi will also include “uba”. [Editor’s note:
subsequent to the meeting, several of the attendees expressed serious
reservations about the complexity of the design below.]
a. The default dir for all elements is “normal”, with the exception of block
elements whose parent’s dir is “uba”. These inherit “uba”.
b. Elements with dir=normal have the same resolved direction (both the
internal HTML “property” used for CSS purposes and the actual CSS property)
as the parent element. It also sets the unicode-bidi CSS property to normal
(unless ubi is explicitly on for that element). The primary purpose for
explicitly stating dir=“normal” is to break dir=“uba” inheritance from the
parent.
c. dir=“uba” sets the resolved direction (as defined above) of the element
according to the UBA applied to its textual content. The textual content is
the in-order traversal of all text nodes (even if they have an explicit
dir).
d. In the application of the UBA to textual content, if the text contains no
characters of the bidi classes L, AL, or R, the resolved direction of the
text is inherited.
e. dir=“uba” sets the unicode-bidi CSS property to “uba”.
f. The base directionality of a UBA paragraph (which is distinct from CSS
direction, which it does not have) whose containing block element has
unicode-bidi:uba is set according to the paragraph’s content using the UBA.
A UBA paragraph’s lines’ alignment is determined by the paragraph’s base
directionality when the text-align of the containing block element is start
or end.
g. To clarify, when an inline element has dir=“uba”, its children do not
inherit dir=“uba”, but do inherit the resolved direction of the inline
element.
h. dir=“uba” implies ubi by default. If ubi is explicitly off on this
element, the unicode-bidi value is “uba embed”. Otherwise, unicode-bidi is
“uba isolate”.
i. TBD: what happens in <textarea> when the user sets an explicit direction
via the browser UI, for all dir values.


--- directional images ---
(Section 2.4)

21. The proposed feature of horizontal flipping of images based on direction
may not be quite as useful as envisioned because some and perhaps even the
majority of images that need modifications for the opposite-direction UI
require modifications more complicated than a simple horizontal flip. (For
example, just part of the image may need flipping.) If one needs two
different image versions for a significant fraction of the images anyway,
one comes up with machinery to deal with that, and there is little
additional cost to have that machinery also deal with the icons that are
amenable to simple flipping. Nevertheless, we estimate that there still will
be cases where such a feature will be genuinely helpful.

22. The proposed feature of horizontal flipping of images based on direction
can also be achieved on the element level by the directional selection
(:rtl) and graphic transformation features (transform:scaleX(-1)) already
proposed for CSS3. There does not appear to be a sufficient need for it on
the HTML level. On the CSS level, however, where an image such as a
background may be specified and may need to be flipped without flipping the
whole element, such a need does exist.

23. The other proposed feature of direction-based choice between two images
specified by two separate urls does not seem very appropriate for HTML,
since the two images are likely to have almost the same URL, differing only
in one of the folder names or a part of the file name. Repeating the longer,
consistent parts of the two URLs would be poor coding practice for HTML,
considering that the alternative of replacing just the variable part of the
URL is easily achieved in the code generating the HTML. The same does not
apply to CSS, which should preferably be static.

24. Thus, instead of the proposed HTML changes, we should consider adding an
rtlflip option to the image notation in CSS3 Images.


--- base direction of dialog text ---
(Section 3.4, except as indicated otherwise below)

25. Approach ECMAScript people, recommending optional explicit direction
parameters for alert(), confirm(), and prompt().

26. In the absence of direction passed in via an explicit parameter, dialog
text (e.g. text displayed using the ECMAScript functions above) should be
broken up into paragraphs, and the direction of each paragraph be
automatically estimated and applied in the paragraph’s display. The text is
broken into paragraphs at characters of bidi class B, e.g. newline.
[Editor’s note: what is the estimation algorithm to be used?]

27. (New section) User agents must implement the Unicode spec re Default
Ignorable Code Points (Unicode Standard version 5.2, Chapter 5, section
5.21), including never displaying the LRM, RLM, LRE, RLE, LRO, RLO, and PDF
characters inappropriately (e.g. as empty boxes or advance widths) even if
the underlying platform does not handle them properly. In particular, this
must be the case for script dialog text, page titles, and tooltips.


--- events on user setting text direction ---
(Section 3.8)

28. There is no need to trigger the oninput event when the user explicitly
sets the direction of an <input> or <textarea> element since the dir
attribute change that this causes should generate the DOM2 DOMAttrModified
event (a MutationEvent).


--- list marker direction ---
(Section 3.10)

29. Currently, all browsers render a list item’s marker on the start side of
the list item, even when the list item’s direction differs from the list’s
direction. Since the list item markers appear in the margin or padding, the
list element automatically sets up a margin on its start side so that the
markers have somewhere to appear. However, the list does not set up a margin
on its end side, and so the opposite-direction markers get cut off by
default. It would be a bad idea to fix this by having the list automatically
leave a margin on the end side because this would waste screen real estate
in the usual case where there are no opposite-direction list items.

30. Since there does not seem to be a way to fix the default display of
opposite-direction list item markers on the end side of the list, and since
in many or most cases the preferred display of opposite-direction list items
is with the marker on the start side for the list, not the list item, it
seems advisable to make opposite-direction list items’ markers occur on the
start side of the list by default. Nevertheless, since in some cases the
preferred display may be on the start side of the item, this should be made
configurable. The right place for such a configuration is CSS.

31. CSS3 will include a new property, list-style-direction, with the values
“left”, “right”, “start”, and “match-me”. (The last is a placeholder name
until we find something better.)
a. The “start” value means according to the list item’s direction.
b. The match-me value is like start, but is inherited as a computed value of
either left or right.
c. The CSS initial value will be “start”. However, to get markers to appear
all on one side in most cases, the default style sheet will specify
":not(li) > ol, :not(li) > ul { list-style-direction:match-me;}". (The
reason we can't change the CSS initial value is because list-style-direction
is effectively 'start' according to CSS2.1, and this default behavior cannot
be changed later. CSS2.1 will not change because there are use cases for the
current behavior and we already have interop on it.)
[Editor’s note: “left”, “right”, and “start” seem to be alignment values,
not direction values. We are trying to deal with marker direction, which
affects not only where the marker is going to be displayed, but the way the
marker’s text will be displayed (e.g. where the period of an ordered marker
goes). It therefore seems that this section needs to be redesigned. Perhaps
the values should be simply “like-list” and “like-item”, with inheritance.]

32. When one does want the opposite-direction list item markers to appear on
the list items’ start sides, one will need to set up margins or padding
appropriately in addition to setting list-style-direction.

33. None of this has any effect on the default alignment of list items,
which will remain at the start side of their own direction. The user will
have to explicitly use li {text-align:match-parent} to change that. We can
not make this the default without breaking the inheritance of text-align.
Received on Wednesday, 18 August 2010 16:26:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 18 August 2010 16:26:43 GMT