- From: Jens Meiert <jens@meiert.com>
- Date: Tue, 27 Oct 2009 20:29:46 +0100
- To: public-html@w3.org
- Cc: "Aharon (Vladimir) Lanin" <aharon@google.com>
- Message-ID: <9fbcac550910271229m614af0c7g40d85f79a029ceff@mail.gmail.com>
Seems to be useful input for the HTML WG too (forwarding the “rich” version to avoid loss of some links): ---------- Forwarded message ---------- From: Aharon (Vladimir) Lanin <aharon@google.com> Date: Mon, Oct 26, 2009 at 10:50 PM Subject: A Proposal for HTML Improvements for Bidi, Part 1: Bidi Aspects of Existing HTML Features To: www-international@w3.org The following is the first part of a proposal for small improvements in HTML handling that should help make it easier to author quality bidi HTML pages and web applications. It is based on the issues that have repeatedly come up during efforts to add bidi support to various Google products that my team has been charged with aiding over the past two years. The current version of this proposal is also available at http://docs.google.com/Doc?id=dd6f586t_19dg4pkqqc Aharon Lanin Google Israel *A Proposal for HTML Improvements for Bidi* * Part 1: Standardizing Bidi Aspects of Existing HTML Features * Preliminaries: - UBA: the Unicode Bidi Algorithm <http://unicode.org/reports/tr9/>. - LTR: left-to-right - RTL: right-to-left - Text displayed in the wrong directionality is often garbled. For example, the LTR value "10 Main St." is displayed in RTL as ".Main St 10". * 1.1. <br>, <hr>, and embedded block elements should "reset" bidi state * *Background * The UBA's sections 3.3.1 and 3.3.2 require that the bidi state be completely reset at a "paragraph break". This means that strongly-directional text (e.g. Latin or Hebrew letters) and explicit bidi formatting characters (e.g. LRE and RLE) in one paragraph have no effect on the formatting of the text in the next paragraph and vice-versa. However, this requirement leaves the definition of a "paragraph" up to the implementation. Most plain-text environments implementing the UBA (e.g. Microsoft Word, Windows Notepad, GNOME gedit, OSX textedit etc.) treat newline and other line-breaking characters as a paragraph break for UBA purposes. *The Problem ** *In HTML, it is well accepted that a block element constitutes a UBA paragraph. However, there is no uniformity in the treatment of <br>, <hr>, and embedded block elements (e.g. <div></div>) in this respect. Firefox and Opera completely ignore them. As a result, when rendering "1. His Hebrew name is אאא.<br>2. בבב is a friend of his." (in an LTR context), they treat the "אאא. 2. בבב" as a single RTL run, and thus put the "2" on the *right *of the "בבב", with the resulting looking like: The result is unreadable. This behavior is even stranger when instead of a <br>, you have a tall <div>, so the effect of the "אאא." is felt again somewhere far down the page. Failing to treat <br>, etc. as a UBA paragraph break goes against the spirit of the UBA - but not its letter. Similarly, in "You can use RLO to make English text go ‮RIGHT-TO-LEFT.<br>But you don't have to.", the unterminated RLO is allowed to exert its influence beyond the <br>, reversing the characters in the next line too: IE and WebKit, on the other hand, treat <br>, <hr>, and embedded block elements as Unicode Bidi Algorithm paragraph breaks, making it easier to author bidi HTML documents. However, WebKit currently goes too far, with a <br> terminating the effects of all directionality levels, including that specified using HTML or CSS on the ancestor inline elements, e.g. <span dir=...>. As a result, it displays "<div dir=rtl><span dir=ltr>1. Hello!<br>2. Goodbye!</span></div> with the second line in RTL: While this does conform to the literal definition of what a paragraph break is supposed to do according to the UBA, it goes against the spirit of HTML. Attempts to fix WebKit in this regard are stymied by the lack of a mandated specification and are reduced to guessing what exactly it should do. IE, on the other hand, seems to terminate the effect of any LRE, RLE, LRO, and RLO formatting codes, but not the effect of the dir attributes of ancestor elements, which seems like a reasonable approach. What exactly it does, however, is undocumented. *The Proposed Solution* The HTML specification should state that any sort of line break - e.g. <br>, <hr>, and embedded block elements - should be treated as a UBA paragraph break. However, the directionality embedding levels stemming from the direction specified on ancestor elements via mark-up (dir attribute, <bdo> element) or CSS up to the closest ancestor block element should then be re-opened in the same order at the start of the new paragraph. This re-opening of embedding levels is allowed by the UBA's section 4.3, HL3. *1.2. newline and other line-breaking characters should "reset" bidi state in <textarea>, <pre> and script dialog text.* *Background *As in 1.1 above. *The Problem *IE and WebKit treat newlines as a UBA paragraph break in all these contexts. Firefox, however does not treat is as such in any of them, while Opera treats it as such in <textarea> and dialog text, but not in <pre>. As a result, Firefox and Opera display "<pre>1. His Hebrew name is אאא.
2. בבב is a friend of his.</pre>" as *The Proposed Solution* The HTML specification should state that any sort of line-breaking character - e.g. 
, 
 - in <textarea>, <pre>, and script dialog text should be treated as a UBA paragraph break. However, in <pre>, the directionality embedding levels stemming from the direction specified on ancestor elements via mark-up (dir attribute, <bdo> element) or CSS up to the closest ancestor block element should then be re-opened in the same order at the start of the new paragraph. This re-opening of embedding levels is allowed by the UBA's section 4.3, HL3. *1.3. <title> and script dialogs should use the page's directionality* *Background *The W3C recommends<http://www.w3.org/TR/i18n-html-tech-bidi/#ri20030112.214820604>that in HTML, the directionality of text be declared using the dir attribute, avoiding the use of Unicode formatting characters LRE, RLE, and PDF except where the dir attribute is inapplicable. *The Problem* One would expect that the page's directionality set using <html dir=...> would apply to the page's <title>, as well as to the text of the page's script dialogs (alert(), confirm(), etc.). Unfortunately, however, this is not the case in any major browser - not Firefox, Chrome, Safari, or Opera. IE6 and IE7 used to apply <html dir=...> to dialog script text, but this is no longer the case in IE8. The directionality context all these browsers use for <title> and dialog text is either the OS or the browser chrome's default directionality, which neither the server nor page scripts can even determine, let alone control. Since a value displayed in the wrong directionality can come out garbled, RTL pages wind up having to wrap their RTL <title> and dialog text in RLE + PDF characters. On the other hand, LTR pages dare not wrap their LTR <title> and dialog text in LRE + PDF characters for correct display on RTL systems, since most computers in the world are running an LTR OS without RTL script support turned on, and thus display LRE and PDF as rectangles. Furthermore, these formatting characters are little-known, lack named entities, and are generally undesirable in HTML documents. *The Proposed Solution* The HTML specification should state that dialog text will be displayed in the <html> element's directionality, and the <title> in its directionality, whether set directly on the <title> element itself or inherited from an ancestor. It is desirable to allow the dir attribute on <title> itself for cases where the title happens to be in a different language than the overall page and thus may not match the page's overall directionality, but it is not nearly as important as at least applying the <html>'s directionality, It is easy enough for a browser to implement this, since it knows the default directionality context in which the text will be displayed. If and only if this differs from the desired directionality, the browser needs to wrap (each paragraph of) the text in question in RLE + PDF when RTL is desired and LRE + PDF when LTR is desired. * **1.4. title and alt attributes should use the element's directionality* *Background *As in 1.3 above. *The Problem* Currently all major browsers (IE, FF, Chrome, Safari, Opera) display tooltips stemming from the title and alt attributes in the directionality of the element where they appear, but this does not appear to be formally specified anywhere. Furthermore, this consensus seems fragile because in principle, the directionality of an element and the text of its tooltip do not have to coincide. Here is a reasonable counterexample: a Hebrew web page displaying an English address with a Hebrew tooltip meaning "address" would use "<span id=address title=כתובת dir=ltr>10 Main St..</span>". Until recently, Chrome displayed tooltips in the OS / browser's default directionality. When fixing this bug, the initial inclination was to apply only the page's directionality, not the element's, due to the "in principle" consideration above. Apparently not trusting browser behavior, the W3C suggests<http://www.w3.org/TR/i18n-html-tech-bidi/#tech-tooltips-etc>that tooltip directionality may have to be set using LRE | RLE + PDF. This is actually quite difficult to do properly, since wrapping an LTR tooltip in LRE + PDF just in case the browser winds up displaying it in an RTL context will result in the LRE and PDF displaying as rectangles on LTR OS's without RTL support enabled, i.e. the vast majority of computers. *The Proposed Solution* The HTML specification should state that title and alt attribute text will be displayed in the element's directionality. Although counterexamples as given above can be found, tooltip text most usually does have the same directionality as the element's text even where the element does have text, which is not very often. For such counterexamples, there is a simple workaround in the form of putting the tooltip on an extra element wrapping the original one, e.g. "<span title=כתובת><span id=address dir=ltr>10 Main St.</span></span>". The alternatives are even less desirable. Having the tooltip use only the page's directionality increases the need to use LRE | RLE + PDF. And defining new alt_dir and title_dir attributes seems wasteful. *1.5. <option> should support the dir attribute, and be displayed that way in both the dropdown and after being chosen* *Background *As in 1.3 above. *The Problem* In a single <select>, the values of different options may have different directionalities. Currently, however, out of all major browsers, only FF supports the dir attribute on <option>, and does so poorly: once the value is chosen, it is displayed in the <select>'s directionality. IE and Opera display all options in the <select>'s directionality. Safari automatically estimates the directionality of each option and displays it as such both in the dropdown and after it has been chosen regardless of the <select>'s directionality (which is only used to place the down-arrow button and to align the values). This is all very nice, but directionality estimation algorithms do make mistakes, so it would be good to be able to specify the actual dir value for a given <option> - and Safari does not support that. Chrome does not support the dir attribute on <option> and is on its way to doing what Safari does. As a result, the only practical way to specify <option> value directionality is using LRE | RLE + PDF, which is cumbersome. *The Proposed Solution *The HTML specification should state that setting an explicit directionality on <option> should determine the way it is displayed in both the dropdown and after being chosen. Using auto-estimated directionality is allowed when the <option> element does not have an explicitly specified directionality. *1.6. <input type="text"> and <textarea> should support compatible "set direction" functionality* *Background *Garbling by incorrect directionality applies to text being entered by the user in an input control, too. In fact, entering text of directionality opposite to the input is an unpleasant experience even if the full text does not wind up being garbled, due to the cursor jumping around during data entry and difficulty in selecting text. Some means for the user to set the directionality of the input, and for page scripts to be informed of this choice so the text's intended directionality can be stored is thus highly desirable. *The Problem* All major browsers provide some way for the user to set the directionality of each <input type="text"> and <textarea> element, e.g. via "hot keys". However, the way this functionality interacts with page scripts varies drastically between browsers. IE: The "hot keys" are CTRL + LEFT SHIFT for LTR and CTRL + RIGHT SHIFT for RTL. (These key combinations are also adopted for this purpose by most Microsoft products, e.g. Windows dialogs, notepad and Word.) They set the value of the element's dir attribute, which is then available to scripts. They trigger the onpropertychange event, at which time the dir value is already changed. They trigger onkeyup, but *before *the dir value has been changed, so setTimeout(0) has to be used to get the updated die value. They do not trigger onkeypress. FF: The "hot key" is CTRL + SHIFT + X, which cycles through LTR and RTL. It does *not *set the value of the element's dir attribute, and is thus invisible to scripts. Opera: The "hot keys" are CTRL + LEFT SHIFT for LTR and CTRL + RIGHT SHIFT for RTL, as in IE. They do *not *set the value of the element's dir attribute, and are thus invisible to scripts.. Chrome: The "hot keys" are CTRL + LEFT SHIFT for LTR and CTRL + RIGHT SHIFT for RTL. They set the value of the element's dir attribute, which is then available to scripts. They trigger the onkeyup event, at which time the dir value is already changed. They do not trigger onkeypress or oninput. They do not trigger onpropertychange, since this event exists only in IE. Safari: Right-click on the <input> or <textarea> provides a "Set paragraph direction" submenu. It is unclear whether "hot keys" can be configured. Using "Set paragraph direction" sets the value of the element's dir attribute, which is then available to scripts. However, it does not trigger onkeyup, onkeypress, or oninput. It also doesn't trigger, or onpropertychange, since this event exists only in IE. *The Proposed Solution* The HTML specification should state that some way to set the direction of <input type=text> and <textarea> elements should be exposed to the user, and using it will: - set the element's dir attribute accordingly - trigger onkeyup *after *the dir attribute has been set - trigger oninput; even though no actual input took place, the user did change the recommended interpretation of the input already collected - trigger onkeypress? I am not sure. But one way or the other, it should be specified. * *Furthermore, it should be stated that on an OS that has a widespread convention for setting direction (such as CTRL + LEFT SHIFT for LTR and CTRL + RIGHT SHIFT for RTL on Windows), the user agent will support that convention (although it may provide other methods too). *1.7. auto-completion should remember and use the directionality of each value* *Background* Some browsers implement auto-completion, a feature whereby values previously entered into an element like <input type=text> are remembered and under certain conditions presented to the user in a dropdown. When the user selects one of the items in the dropdown, this value is assigned to the element. At different times, the user may enter values of different directionality for the same input. The directionality of a value is set either directly by the user through a "set direction" command exposed by the browser (e.g. "hot keys", see 1.6 above) or letting page scripts automatically set the input's dir attribute after estimating the directionality of the value on the fly. *The Problem * Browsers do not remember the directionality of previously-entered values. Some display them in the dropdown in the OS or browser default directionality. Some display them in the input's current directionality. Finally, some display each value in its own estimated directionality. Each of these will result in some values being displayed incorrectly; even the last approach will sometimes fail because estimation algorithms do make mistakes, and this may not have been the directionality originally set by the user or page scripts. After the user chooses a value from the dropdown, the value is usually displayed in the input's current directionality, which may or may not be correct for it. *The Proposed Solution* The HTML specification should state that if a user agent implements auto-completion, it should store the last-used directionality for each value. This may be the original directionality of the element, or may have been set by the user for that value via directionality "hot keys", or may have been set for that value by page scripts. When a value is displayed in an auto-completion dropdown, it should be displayed in the directionality stored for it. When a value is chosen by the user, the element's dir value should be set to the directionality stored for it. -- Jens Meiert http://meiert.com/en/
Received on Tuesday, 27 October 2009 20:21:30 UTC