Re: [inline bidi update] - Some comments from Richard Ishida on 2014-02-05 (www-international@w3.org from January to March 2014)

From: Richard Ishida <ishida@w3.org>
Date: Wed, 05 Feb 2014 13:41:28 +0000
To: Lina Kemmel <LKEMMEL@il.ibm.com>, www-international@w3.org
Message-ID: <52F23F88.8050005@w3.org>
Lina,

Thank you for taking the time to send these comments.

On 04/02/2014 13:09, Lina Kemmel wrote:
> Hello Richard, Aharon et al,
>
> Please find below some comments on the article.
>
> 1. "If the text to be marked up is tightly wrapped by a non-inline element
> you would usually add the dir attribute to that element. In some cases
> this can lead to the block of text being aligned on the page in a way that
> is not desirable. To avoid this, you can add an inline element immediately
> inside the tags of the existing markup ..."
>
> Comment: Alternatively, mismatching direction and alignment in a
> block-level element can be achieved by specifying both dir and align
> attributes, for example: <p dir=rtl align=left>ABC...</p>.
> This would cause the paragraph content to have right-to-left base
> direction and be aligned to the left.

Yes

> 2. "if the tightly-wrapped phrase in the previous step is followed inline
> (possibly after some intervening neutral characters) by a number or a
> logically separate opposite-direction phrase, then add a directional mark
> (RLM or LRM) immediately after the markup of that phrase. " [referring to
> HTML4]
>
> Comment: It can be necessary to add a directional mark also before the
> markup of the phrase to be isolated.
> For example, in an LTR paragraph, an RTL phase to be isolated is dropped
> from a database with a directional markup added, but the preceding RTL
> phrase doesn't contain such a markup. If the relative order of the 2
> successive RTL phrases should be preserved in display (to follow the LTR
> base text direction), an LRM character should be inserted before the
> injected phrase.
>
> <p>the concatenation of tokens is: RTL-TEXT * <span dir=rtl>
> RTL-INJECTED-TEXT</span></p>
>
> What you'd see without LRM is:
> the concatenation of tokens is: TXET-DETCEJNI-LTR * TXET-LTR
>
> What you'd expect to see:
> the concatenation of tokens is: TXET-LTR * TXET-DETCEJNI-LTR
>
> This is accomplished by adding an LRM before the injected phrase:
> <p>the concatenation of tokens is: RTL-TEXT * &lrm;<span dir=rtl>
> RTL-INJECTED-TEXT</span></p>

In these cases, the directional mark is still being added after 
something - just not the thing that was inserted. I think that if you 
have a problem, you should be able to figure this out from the general 
rule given, and so it's best to keep the rule simple.

> ============
> 3. General comment on changing the dir semantics in the HTML standard
> itself. In HTML5 dir actually duplicates BDI (which seems to be
> redundant), and there is no markup to get back to the old behavior
> (LRE/RLE ... PDF equivalents). The use case as above, but when the
> concatenated fragments are expected to flow from right to left:
>
> What you'd expect to see:
> the concatenation of tokens is: TXET-DETCEJNI-LTR * TXET-LTR

bdi is useful for text that is inserted into content where you don't 
know the direction of the inserted text, since it guesses that direction 
for you. It is can be convenient when you need to add markup, since it's 
simpler to write <bdi> than <span dir=auto>.

> ============
>
>
> 4. "dynamic use cases..."
>
> Comment: Again, a comment on the standard itself. A non-neglectable
> "dynamic" case is editable text. Currently, the standard doesn't address
> inline formatting (bidirectional embeds, isolates, overrides) in editable
> text.

You should consider raising a bug for HTML5 about that.

> 5. "There are some situations where you may not be able to use the markup
> described in the previous section. In HTML these include the title element
> and any attribute value.
> In these situations you have to use the invisible Unicode characters that
> produce the same results..."
>
> Comment: For perfect isolation, one should enclose an embedded phrase in 2
> pairs of characters (unless RLI, LRI, FSI, PDI are supported). The first
> pair of characters consists of one of U+200E LEFT-TO-RIGHT MARK (LRM) or
> U+200F RIGHT_TO_LEFT MARK (RLM) [choose the one consistent with the base
> text direction] AND one of U+202B RIGHT-TO-LEFT EMBEDDING (RLE) or U+202A
> LEFT-TO-RIGHT EMBEDDING (LRE) [choose the one to match the desired
> embedded phrase direction]. This corresponds to the markup <span
> dir="rtl"> or <span dir="ltr">. The second pair of characters consists of
> U+202C POP DIRECTIONAL FORMATTING (PDF) AND one of U+200E LEFT-TO-RIGHT
> MARK (LRM) or U+200F RIGHT_TO_LEFT MARK (RLM) again. That being said,
> LRM/RLM may be not mandatory in certain contexts (which applies also to
> LRE/RLE or corresponding markup BTW).

I added a paragraph as follows:

"If isolation is necessary, either within the text or when the text is 
used with surrounding content, in addition to RLE/LRE...PDF, you may 
also need to add the LRM or RLM marks as described in the section about 
legacy browser support."

Cheers,
RI
Received on Wednesday, 5 February 2014 13:41:57 UTC