- From: Aharon (Vladimir) Lanin <aharon@google.com>
- Date: Wed, 18 Aug 2010 19:25:53 +0300
- To: public-i18n-bidi@w3.org
- Message-ID: <AANLkTik=56LxN3S0qenYcxfUKEd38ki+xNQa-J0CY=8S@mail.gmail.com>
The following are the resolutions reached by the face-to-face meeting on Additional Requirements for Bidi in HTML (http://www.w3.org/TR/html-bidi/), which took place on June 7-9 in Mountain View, California (and by teleconference). The meeting’s discussions covered most sections of the proposal, and the items below are conclusions that reached consensus during the meeting. All new names introduced below are tentative and subject to review by the relevant W3C working groups. However, where highly abbreviated names are suggested, their conciseness should be preserved. The meeting was attended by: Adil Allawi, Aharon Lanin, Behdad Esfahbod, Bob Jung, Craig Cummings, Ehsan Akhgari, Fantasai, Mark Davis, Matitiahu Allouche, Najib Tounsi, Norbert Lindenberg, Roozbeh Pournader, Tab Atkins, and Xiaomei Ji. --- bidi isolation --- (Section 2.1, except as indicated otherwise below) 1. Rename the bdi attribute to ubi (Unicode Bidi Isolate) 2. ubi syntax is ubi=”ubi”|””|”off”. The “ubi” and empty string values are equivalent, and mean that bidi isolation is on for the element. 3. (Sections 2.1, 3.3) ubi has an effect on all and only elements that are rendered as CSS non-replaced inline boxes. Thus, for example.: a. ubi will be ignored by any elements that are not display:inline (or display:runin when it behaves as display:inline). This includes display:inline-block elements (which should continue to use bidi isolation, as already stated in the spec, regardless of their ubi attribute value) and normally inline elements whose display has been set to something other than inline. b. block elements whose display has been set to inline will be subject to ubi. c. ubi will be ignored by floating and position:absolute (and fixed) elements, even though they may have display:inline. 4. Change the definition of ubi to use “isolation”, as opposed to “separation”, i.e.: The content of an element with ubi on will appear in the same location and have the same effect on the bidi ordering around it as a single neutral character (bidi class ON). The bidi ordering within the element is determined by treating its contents as an independent UBA paragraph or sequence of paragraphs, with the element’s computed direction as their base direction. 5. (Sections 2.1, 2.2, 3.1) The default value for ubi is: a. “ubi” (i.e. on) for elements where dir=auto b. “ubi” (i.e. on) for block elements with display:inline c. “off” in all other cases. Earlier suggestions to turn ubi on by default for <br>, <a>, and display:inline-block have been rejected. 6. The CSS equivalent of ubi is unicode-bidi:isolate. Thus, it does not inherit (neither in CSS nor in HTML). Please note that the “isolate” value can be combined with “bidi-override”, which is what would have to happen for <bdo dir=ltr|rtl ubi>. [Editor’s note: we should say something about “isolate” taking precedence over other unicode-bidi values. e.g. “embed” and “normal”.] 7. Once any browser implements ubi, add a W3C best practice for authors to use ubi on <a>. 8. We have discussed but not reached a conclusion for the following suggestion: When translating HTML to plain text, e.g. for copy/paste, the result should contain the appropriate existing Unicode directional formatting codes so that the text is displayed in the same visual order (by UBA-compliant software) as the HTML, while retaining the text’s logical order. This should be taken up in an e-mail thread. --- line breaks as UBA paragraph breaks --- (Sections 3.1, 3.2, and 3.3, as indicated below) 9. (Section 3.1) Add a new HTML attribute that affects the behavior of all and only descendant <br> elements: a. Tentative syntax for the attribute: bidibreak=”soft”|”hard”. The “soft” value means to treat the <br> as the UBA bidi class WS (as explicitly required in HTML 4). The “hard” value means to treat it as B. b. The default value is “hard”. c. Thus, to get behavior in mark-up like that of U+2028 in plain text, use <br bidibreak=soft>. Since the attribute inherits, it could also be specified on an ancestor element, e.g. for poetry, or on the root element for documents that rely on the bidi behavior specified for <br> by HTML 4. d. bidibreak does not have a CSS equivalent. 10. (Section 3.2) All non-collapsed newlines, e.g. in <pre> and <textarea>, are to be treated as UBA paragraph breaks, regardless of the value of bidibreak. 11. (New section) HTML5 and CSS2.1 should clarify that U+2028 and U+2029 in <pre> and <textarea> should behave as they do in plain text. 12. (Section 3.3) Out-of-flow elements, e.g. floating or position:absolute ones, do not have any effect on surrounding content, e.g. they do not introduce a UBA paragraph break even if they do have display:block. --- auto-direction --- (Section 2.2) 13. dir=“auto” sets the CSS direction property to either “ltr” or “rtl”. There will be no such thing as “direction:auto” in CSS. --- “formatting” auto-direction --- (Section 2.2) 14. We will not consider at this time adding a dir value that (assuming standard existing UBA treatment of the text) can only be implemented by inserting directional formatting codes into the text. --- word-count auto-direction --- (Section 2.2) 15. It seems unlikely that a language-unaware direction estimation algorithm based on counting LTR and RTL words can be uniformly successful across different languages, because: a. Different languages are likely to use different numbers of words to express the same concept. German, for example, is well-known to often use a long compound word where English would use two or three separate words. b. The proposal’s suggestion to use line-break opportunities as word boundaries in order to deal with languages such as Chinese, Japanese, and Korean, which do not use spaces between words, does not seem likely to work well for this purpose. In most cases, what would be considered a word in Chinese consists of two or three characters, but line breaks are allowed between them. Thus, word counts are likely to be highly inflated for CJK text if based on line break opportunities. True word counts for such languages may require dictionary look-up, which is prohibitively expensive for the purpose of direction estimation. 16. A character-count-based direction estimation algorithm, with different coefficients for characters from different scripts, seems likely to give results as good or better than the word-count-based algorithm, while being significantly easier to implement. 17. Efficiency is likely to become problematic for count-based direction estimation unless a limit is placed on the length of text examined. 18. Progress on relative-count-based direction estimation will require research that compares the results of various algorithms (and coefficients used by the algorithms) on actual text samples of known author-assigned overall direction. --- per-paragraph auto-direction --- (Section 2.2) 19. In plain text, the UBA supports per-paragraph auto-direction: unless a base direction is specified externally, the base direction of each UBA paragraph is assigned based on that paragraph’s content (namely its first character with strong direction) independently of the others. There exist text editors that support this feature (e.g. gedit). It would be desirable to add such support to HTML as well. For example, there should be an easy way to enter text in a <textarea> and then display it in a <pre> using UBA’s per-paragraph’s auto-direction in both cases. The following is an attempt to design such a dir=uba feature, in addition to the dir=auto already proposed. 20. The values for dir will also include “normal”, “auto”, and “uba”, and the values for unicode-bidi will also include “uba”. [Editor’s note: subsequent to the meeting, several of the attendees expressed serious reservations about the complexity of the design below.] a. The default dir for all elements is “normal”, with the exception of block elements whose parent’s dir is “uba”. These inherit “uba”. b. Elements with dir=normal have the same resolved direction (both the internal HTML “property” used for CSS purposes and the actual CSS property) as the parent element. It also sets the unicode-bidi CSS property to normal (unless ubi is explicitly on for that element). The primary purpose for explicitly stating dir=“normal” is to break dir=“uba” inheritance from the parent. c. dir=“uba” sets the resolved direction (as defined above) of the element according to the UBA applied to its textual content. The textual content is the in-order traversal of all text nodes (even if they have an explicit dir). d. In the application of the UBA to textual content, if the text contains no characters of the bidi classes L, AL, or R, the resolved direction of the text is inherited. e. dir=“uba” sets the unicode-bidi CSS property to “uba”. f. The base directionality of a UBA paragraph (which is distinct from CSS direction, which it does not have) whose containing block element has unicode-bidi:uba is set according to the paragraph’s content using the UBA. A UBA paragraph’s lines’ alignment is determined by the paragraph’s base directionality when the text-align of the containing block element is start or end. g. To clarify, when an inline element has dir=“uba”, its children do not inherit dir=“uba”, but do inherit the resolved direction of the inline element. h. dir=“uba” implies ubi by default. If ubi is explicitly off on this element, the unicode-bidi value is “uba embed”. Otherwise, unicode-bidi is “uba isolate”. i. TBD: what happens in <textarea> when the user sets an explicit direction via the browser UI, for all dir values. --- directional images --- (Section 2.4) 21. The proposed feature of horizontal flipping of images based on direction may not be quite as useful as envisioned because some and perhaps even the majority of images that need modifications for the opposite-direction UI require modifications more complicated than a simple horizontal flip. (For example, just part of the image may need flipping.) If one needs two different image versions for a significant fraction of the images anyway, one comes up with machinery to deal with that, and there is little additional cost to have that machinery also deal with the icons that are amenable to simple flipping. Nevertheless, we estimate that there still will be cases where such a feature will be genuinely helpful. 22. The proposed feature of horizontal flipping of images based on direction can also be achieved on the element level by the directional selection (:rtl) and graphic transformation features (transform:scaleX(-1)) already proposed for CSS3. There does not appear to be a sufficient need for it on the HTML level. On the CSS level, however, where an image such as a background may be specified and may need to be flipped without flipping the whole element, such a need does exist. 23. The other proposed feature of direction-based choice between two images specified by two separate urls does not seem very appropriate for HTML, since the two images are likely to have almost the same URL, differing only in one of the folder names or a part of the file name. Repeating the longer, consistent parts of the two URLs would be poor coding practice for HTML, considering that the alternative of replacing just the variable part of the URL is easily achieved in the code generating the HTML. The same does not apply to CSS, which should preferably be static. 24. Thus, instead of the proposed HTML changes, we should consider adding an rtlflip option to the image notation in CSS3 Images. --- base direction of dialog text --- (Section 3.4, except as indicated otherwise below) 25. Approach ECMAScript people, recommending optional explicit direction parameters for alert(), confirm(), and prompt(). 26. In the absence of direction passed in via an explicit parameter, dialog text (e.g. text displayed using the ECMAScript functions above) should be broken up into paragraphs, and the direction of each paragraph be automatically estimated and applied in the paragraph’s display. The text is broken into paragraphs at characters of bidi class B, e.g. newline. [Editor’s note: what is the estimation algorithm to be used?] 27. (New section) User agents must implement the Unicode spec re Default Ignorable Code Points (Unicode Standard version 5.2, Chapter 5, section 5.21), including never displaying the LRM, RLM, LRE, RLE, LRO, RLO, and PDF characters inappropriately (e.g. as empty boxes or advance widths) even if the underlying platform does not handle them properly. In particular, this must be the case for script dialog text, page titles, and tooltips. --- events on user setting text direction --- (Section 3.8) 28. There is no need to trigger the oninput event when the user explicitly sets the direction of an <input> or <textarea> element since the dir attribute change that this causes should generate the DOM2 DOMAttrModified event (a MutationEvent). --- list marker direction --- (Section 3.10) 29. Currently, all browsers render a list item’s marker on the start side of the list item, even when the list item’s direction differs from the list’s direction. Since the list item markers appear in the margin or padding, the list element automatically sets up a margin on its start side so that the markers have somewhere to appear. However, the list does not set up a margin on its end side, and so the opposite-direction markers get cut off by default. It would be a bad idea to fix this by having the list automatically leave a margin on the end side because this would waste screen real estate in the usual case where there are no opposite-direction list items. 30. Since there does not seem to be a way to fix the default display of opposite-direction list item markers on the end side of the list, and since in many or most cases the preferred display of opposite-direction list items is with the marker on the start side for the list, not the list item, it seems advisable to make opposite-direction list items’ markers occur on the start side of the list by default. Nevertheless, since in some cases the preferred display may be on the start side of the item, this should be made configurable. The right place for such a configuration is CSS. 31. CSS3 will include a new property, list-style-direction, with the values “left”, “right”, “start”, and “match-me”. (The last is a placeholder name until we find something better.) a. The “start” value means according to the list item’s direction. b. The match-me value is like start, but is inherited as a computed value of either left or right. c. The CSS initial value will be “start”. However, to get markers to appear all on one side in most cases, the default style sheet will specify ":not(li) > ol, :not(li) > ul { list-style-direction:match-me;}". (The reason we can't change the CSS initial value is because list-style-direction is effectively 'start' according to CSS2.1, and this default behavior cannot be changed later. CSS2.1 will not change because there are use cases for the current behavior and we already have interop on it.) [Editor’s note: “left”, “right”, and “start” seem to be alignment values, not direction values. We are trying to deal with marker direction, which affects not only where the marker is going to be displayed, but the way the marker’s text will be displayed (e.g. where the period of an ordered marker goes). It therefore seems that this section needs to be redesigned. Perhaps the values should be simply “like-list” and “like-item”, with inheritance.] 32. When one does want the opposite-direction list item markers to appear on the list items’ start sides, one will need to set up margins or padding appropriately in addition to setting list-style-direction. 33. None of this has any effect on the default alignment of list items, which will remain at the start side of their own direction. The user will have to explicitly use li {text-align:match-parent} to change that. We can not make this the default without breaking the inheritance of text-align.
Received on Wednesday, 18 August 2010 16:26:43 UTC