W3C home > Mailing lists > Public > www-international@w3.org > January to March 2014

[inline bidi update] - Some comments

From: Lina Kemmel <LKEMMEL@il.ibm.com>
Date: Tue, 4 Feb 2014 15:09:15 +0200
To: www-international@w3.org
Message-ID: <OF4D378E2E.5DE71F50-ONC2257C75.0042DBDA-C2257C75.00484831@il.ibm.com>
Hello Richard, Aharon et al,

Please find below some comments on the article.

1. "If the text to be marked up is tightly wrapped by a non-inline element 
you would usually add the dir attribute to that element. In some cases 
this can lead to the block of text being aligned on the page in a way that 
is not desirable. To avoid this, you can add an inline element immediately 
inside the tags of the existing markup ..."

Comment: Alternatively, mismatching direction and alignment in a 
block-level element can be achieved by specifying both dir and align 
attributes, for example: <p dir=rtl align=left>ABC...</p>.
This would cause the paragraph content to have right-to-left base 
direction and be aligned to the left.

2. "if the tightly-wrapped phrase in the previous step is followed inline 
(possibly after some intervening neutral characters) by a number or a 
logically separate opposite-direction phrase, then add a directional mark 
(RLM or LRM) immediately after the markup of that phrase. " [referring to 

Comment: It can be necessary to add a directional mark also before the 
markup of the phrase to be isolated.
For example, in an LTR paragraph, an RTL phase to be isolated is dropped 
from a database with a directional markup added, but the preceding RTL 
phrase doesn't contain such a markup. If the relative order of the 2 
successive RTL phrases should be preserved in display (to follow the LTR 
base text direction), an LRM character should be inserted before the 
injected phrase.

<p>the concatenation of tokens is: RTL-TEXT * <span dir=rtl>

What you'd see without LRM is:
the concatenation of tokens is: TXET-DETCEJNI-LTR * TXET-LTR

What you'd expect to see:
the concatenation of tokens is: TXET-LTR * TXET-DETCEJNI-LTR

This is accomplished by adding an LRM before the injected phrase:
<p>the concatenation of tokens is: RTL-TEXT * &lrm;<span dir=rtl>

3. General comment on changing the dir semantics in the HTML standard 
itself. In HTML5 dir actually duplicates BDI (which seems to be 
redundant), and there is no markup to get back to the old behavior 
(LRE/RLE ... PDF equivalents). The use case as above, but when the 
concatenated fragments are expected to flow from right to left:

What you'd expect to see:
the concatenation of tokens is: TXET-DETCEJNI-LTR * TXET-LTR

4. "dynamic use cases..."

Comment: Again, a comment on the standard itself. A non-neglectable 
"dynamic" case is editable text. Currently, the standard doesn't address 
inline formatting (bidirectional embeds, isolates, overrides) in editable 

5. "There are some situations where you may not be able to use the markup 
described in the previous section. In HTML these include the title element 
and any attribute value.
In these situations you have to use the invisible Unicode characters that 
produce the same results..."

Comment: For perfect isolation, one should enclose an embedded phrase in 2 
pairs of characters (unless RLI, LRI, FSI, PDI are supported). The first 
pair of characters consists of one of U+200E LEFT-TO-RIGHT MARK (LRM) or 
U+200F RIGHT_TO_LEFT MARK (RLM) [choose the one consistent with the base 
text direction] AND one of U+202B RIGHT-TO-LEFT EMBEDDING (RLE) or U+202A 
LEFT-TO-RIGHT EMBEDDING (LRE) [choose the one to match the desired 
embedded phrase direction]. This corresponds to the markup <span 
dir="rtl"> or <span dir="ltr">. The second pair of characters consists of 
MARK (LRM) or U+200F RIGHT_TO_LEFT MARK (RLM) again. That being said, 
LRM/RLM may be not mandatory in certain contexts (which applies also to 
LRE/RLE or corresponding markup BTW).

Lina Kemmel
  Bidi architect
Received on Wednesday, 5 February 2014 09:44:24 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:41:04 UTC