Re: Rework of Bidi inline article from Aharon (Vladimir) Lanin on 2012-02-07 (public-i18n-core@w3.org from January to March 2012)

From: Aharon (Vladimir) Lanin <aharon@google.com>
Date: Tue, 7 Feb 2012 15:20:23 +0200
To: Richard Ishida <ishida@w3.org>
Cc: Matitiahu Allouche <matial@il.ibm.com>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-ID: <CA+FsOYb3=Wb2ssYAJheR7vpUa55hearT+4GyyJ8_osvLsYsN2g@mail.gmail.com>
Perfectly well -> quite well; typically -> often

addresses -> street addresses, email addresses

also user names, file names and paths

Whenever an opposite-direction phrase occurs: before this paragraph, I
think the following paragraph needs to be inserted:
The problem is exacerbated in applications that drop text into a page, say
from a database. The application often does not know a-priori if such text
is (or perhaps contains) an opposite-direction phrase, and has to estimate
its direction at at run-time by checking the Unicode ranges of its
characters. HTML5 introduces a feature for doing so in the browser.

is followed by -> is followed inline by (Twice. If you think that you need
to explain what "inline" means, you can say something like "By inline, we
mean without a line break forced between them by things like <br> or the
start or end of a block element."

contains one or more nested phrases whose base direction is opposite to
that of the surrounding text -> itself contains one or more nested
opposite-direction phrases - opposite to it

in the current generation of browsers, and in HTML5 -> in the current
generation of browsers, and when browsers complete the implementation of
some bidi features added in HTML5

in Hebrew -> on the page (twice)

Neutrals between same directional runs can sometimes be misinterpreted ->
just remove this sentence.

boundary of two directional runs in Arabic -> boundary between two
logically separate Arabic phrases

PURPLE PIZZA roma -> PIZZA pizza (it's a much more likely name)

handling mixed direction phrases in html4: We have spoke of
"opposite-direction phrases", not "mixed direction phrases". Thus, this
should be "handling opposite-direction phrases in html4".

Note also how the cite tag falls inside the quote marks: cite isn't used,
just a span. cite would probably better.

that happens to follow it -> that happens to follow it inline

Putting it all together in HTML4: Here is an updated text for this section,
plus a new section preceding it:
--- start ---
Handling text of unknown direction

When an application needs to insert text data into a page, but does not
know the text's base direction, it must estimate it by checking the Unicode
ranges of its characters. This task is beyond the scope of this article,
but there are open-source libraries that include such functionality.

Obviously, if the text's estimated overall direction is the opposite of the
context's, the application must treat the text as an opposite-direction
phrase and wrap it in an element with the dir attribute set to its
direction, as well as optionally appending an LRM or RLM to match the
context as discussed above.

However, if the text is judged to be of the same overall direction as the
context, it does not mean that it is safe to insert it into the page as is.
It could still contain a nested opposite-direction phrase, and in
particular end in one. For example, let's say the context is a Hebrew page,
and the text is a book title that happens to be "AN INTRODUCTION TO java",
where the uppercase words are actually Hebrew. If the whole text were
inserted into the page without any extra precautions, the LTR word "java"
at the end would cause the usual troubles with numbers and other
opposite-direction phrases following our book title inline.

To best way to deal with this is to wrap an unknown-direction text with an
element with the dir attribute set to its estimated direction even if that
direction is judged to be the same as that of the context when the text
contains *any* characters of the opposite direction.

Putting it all together in HTML4

To summarize, in HTML4, to make sure that a phrase that contains any
opposite-direction characters is displayed correctly, up to two steps are
necessary:

1. Tightly wrap the phrase in an element that uses the dir attribute to set
the direction of the phrase. This is not always necessary, but never does
any harm. If the phrase is already tightly wrapped in an element, you can
use the existing element for this purpose. However, if the existing element
has non-inline display (e.g. a <p> or <div> element), setting its dir
attribute may also set its alignment. If this happens to be undesirable,
wrap the phrase in a  <span> inside the non-inline element and set the dir
attribute on the span.
2. If the phrase is of the opposite direction to the context and is
followed inline (possibly after some intervening neutral characters) by a
number or a logically separate opposite-direction phrase, separate the two
with a directional mark matching the direction of the context. If you do
not want to check whether this is actually the case, you can add a
directional mark matching the context's direction after every
opposite-direction phrase. You never have to do this when element on which
you set the dir attribute has non-inline display (since nothing follows it
inline).
--- end ---

To summarize, in HTML4, to make sure that an opposite-direction phrase is
displayed correctly -> To summarize, in HTML4, to make sure that a phrase
that contains any opposite-direction characters is displayed correctly

Tightly wrap the opposite-direction phrase in an element that uses the dir
attribute to set the direction of the phrase -> Tightly wrap the phrase in
an element that uses the dir attribute to set the direction of the phrase

If the opposite-direction phrase is followed -> If the phrase is of the
opposite direction to the context and is followed inline

If it's, say, a Latin character, the direction will be ltr -> Otherwise,
the direction will be ltr

inside a bdi element or an element with a dir tag of its own -> inside an
embedded bdi element or an embedded element with a dir tag of its own

Putting it all together in HTML5:

Regarding your alternatives, I do not want the page (or application) author
to have to figure out if the phrase could be followed inline by something
problematic. And I do not want an application author to have to figure out
whether dir=auto will result in the direction that the application knows is
the right direction. In other words, I do not want an application to use
dir=auto (or <bdi> without dir) if the true direction is available to it.
Thus, I want an application to wrap the content in <bdi dir=...> if it
knows the direction. And, as explained above, it is important to do this
even if that direction is the same as the context's, if the phrase might
contain any opposite-direction characters.

Here is an updated text for this section (hopefully, with the problematic
parts clarified):
--- start ---
To summarize, in HTML5, to make sure that a phrase that contains (or, in an
application, might contain) any opposite-direction characters is displayed
correctly, do the following:

1. If the phrase is already tightly wrapped in an element that has
non-inline display (e.g. a <p> or <div> element), and setting this
element's alignment to the "start" of the phrase's direction happens to be
desirable, set the existing element's dir attribute. If you do not know the
phrase's direction, set the dir attribute to "auto". If the phrase is known
to have the same direction as the context, it is not necessary to set it.
2. Otherwise, if you know the phrase's overall direction (or have a better
way of determining it than the method used by dir="auto"), wrap the phrase
in <bdi dir="ltr"> or <bdi dir="rtl">, as appropriate. Since the phrase
contains (or might contain) some opposite-direction characters, do this
even when the phrase's overall direction is the same as the context.
3. (Optional) Otherwise, if the phrase is already tightly wrapped by an
element with inline display, add dir="auto" to the element.
4. Otherwise, wrap the phrase in <bdi>. Without an explicit dir value,
dir="auto" is implied.
--- end ---

ED: NOT IF THIS WILL PRODUCE THE WRONG DIRECTION: Huh? How can it, given
that the "Otherwise" means you don't know its direction?

Using bdi and dir="auto" for our use cases:
The way you have it for the third algorithm is pretty close to what I want.
The differences is that I would like you to add a case where the
application happens to know the phrase's direction, as a separate datum.

If there is suitable markup ... add dir="auto" to that element - If there
is suitable markup ... you can just add dir="auto" to that element

By: You may want to add me as a second author at this point. Up to you.

On Thu, Feb 2, 2012 at 6:04 PM, Richard Ishida <ishida@w3.org> wrote:

> Hi Aharon,
>
> Given that Firefox now also supports bdi, I've been working on the bidi
> articles again.
>
> I began integrating your ideas at http://www.w3.org/**
> International/tutorials/new-**bidi-xhtml/Overview-inline.en.**php#where<http://www.w3.org/International/tutorials/new-bidi-xhtml/Overview-inline.en.php#where>
>
> (The old version is still accessible at http://www.w3.org/**
> International/tutorials/new-**bidi-xhtml/Overview-inline-**pre-AL.en.php<http://www.w3.org/International/tutorials/new-bidi-xhtml/Overview-inline-pre-AL.en.php>
> )
>
> However I had some trouble with your summary of how to handle markup in
> HTML5.  I proposed a restructuring of what I thought you were saying, that
> takes into account opposite-direction phrases that are followed by a number
> or by another, but logically separate phrase.
>
> I also suggest a third, alternative approach, which I think is far easier
> for the content author to work with (use bdi and auto pretty much all the
> time).
>
> Below that algorithmic work, you'll see some worked examples for the new
> versions.
>
> Please let me know what you think.  Bear in mind that this is still very
> drafty.
>
> Cheers,
> RI
>
>
>
> On 22/11/2011 14:17, Aharon (Vladimir) Lanin wrote:
>
>> Sorry that it took me a while to answer. However, most of the time was
>> spent on formulating my suggestions below.
>>
>> Re https://plus.google.com/__**103190014606131822578/posts/__**
>> MgiwWfu3Rrt<https://plus.google.com/__103190014606131822578/posts/__MgiwWfu3Rrt>
>> <https://plus.google.com/**103190014606131822578/posts/**MgiwWfu3Rrt<https://plus.google.com/103190014606131822578/posts/MgiwWfu3Rrt>>,
>> you
>>
>> are correct that people leave a gap and write the number from the
>> biggest digit to the smallest, since that is the order in which they
>> have the number in their head. You are also correct about the order in
>> which math is written in Hebrew and (supposed to be) written in Arabic.
>> I do not know Arabic and have never tried writing math right-to-left, so
>> I don't have a clue about the hand movements there. I would expect that
>> they are indeed a challenge.
>>
>> Re the article, I think that the sections looking at solutions for the
>> five problematic cases ("Neutral ...", "Weak ...", "Nesting ...",
>> "Adjacent ...", and "Handling unknown") taken together are quite long
>> and tiring. I believe that this stems from several issues:
>>
>>  1. The sections looks like an arbitrary collection of cases, where for
>>
>>    each you try a bunch of techniques for fixing, and for some one
>>    technique works best, and for some, another. The user is left with
>>    the impression that in order to figure out how to deal with the case
>>    giving him trouble, he will have to figure out which of these cases
>>    his most resembles - a difficult mission for most users. And what if
>>    his case resembles more than one of these cases? And what if the
>>    user is faced with plopping an arbitrary, unpredictable piece of
>>    text into his page? I believe that what is necessary is a clear
>>    statement that it is the occurrence of /opposite-direction phrases
>>    /that causes all problems, with a concise statement of how to handle
>>
>>    an opposite-direction phrase (whatever it may be) to make sure that
>>    no problems arise. If this were so, the various cases would be just
>>    examples of applying the general approach - and could be safely
>>    skipped by a reader that understood the general approach. (There
>>    are, of course, two general approaches: one for HTML4, and one for
>>    HTML5.)
>>  2. Applying each of the HTML4 mark-up, HTML5 mark-up, and LRM/RLM
>>
>>    techniques to each of the cases is needlessly repetitive. Let's say
>>    that a given technique works for one set of cases, and does not work
>>    for another. Within each set, it works (or does not work) exactly
>>    the same way for all cases, so mentioning it again and again for
>>    each case becomes repetitive.
>>  3. The case definitions are needlessly fuzzy. For example, "weak
>>
>>    directional characters that appear at the wrong side of a
>>    directional run" is a conflation of two very different cases: one
>>    where an opposite-direction phrase starts with a number (that is
>>    part of it), and one where an opposite-direction phrase is followed
>>    by a number (that is not part of it). In HTML4, the two have
>>    completely different solutions.
>>  4. There may be simply too many cases / examples.
>>
>>
>>
>> So, here is my attempt at an alternative way of presenting the material,
>> starting from the beginning of "Where the algorithm needs help". I am
>> reusing your copy in many places, but watch out where I may have made
>> changes.
>>
>>    The bidi algorithm will handle text perfectly well in many
>>    situations, and often no special markup or other device is needed
>>    other than to set the overall direction for the document. However,
>>    the more a document mixes text of both directions, the higher the
>>    chances that some of it will be displayed not as intended. When this
>>    happens, extra mark-up or other devices have to be added to the
>>    document to untangle the bidirectional text.
>>
>>    We will examine specific examples of what can go wrong, why it goes
>>    wrong, and what fixes it in the sections below. Nevertheless, it is
>>    important to realize that basically, the problems all occur when a
>>    text (e.g. a document) in one direction has to include a phrase in
>>    the opposite direction. Common examples of such "phrases" include
>>    quotations, formatted numbers (e.g. phone numbers and MAC
>>    addresses), addresses, and various names, such as brand names,
>>    acronyms, part numbers, site names, articles titles, place names,
>>    etc. Whenever an opposite-direction phrase occurs, things can go
>>    wrong. That is, something will go wrong if the text includes,
>>    without any special "wrapping", an opposite-direction phrase that:
>>
>>      * begins or ends with neutral characters
>>      * begins with a number
>>
>>      * is followed by a number
>>      * is followed by another, logically separate opposite-direction
>> phrase
>>      * contains one or more nested phrases whose direction is opposite
>>        to /it/
>>
>>
>>    Although this list seems daunting, there is no need to determine
>>    which, if any, of these cases applies to a particular phrase. There
>>    are canonical ways of "wrapping" opposite-direction phrases that
>>    will prevent problems in all of the cases above, and do no harm when
>>    none of them apply. We now describe how such wrapping is done in the
>>    current generation of browsers, and in HTML5.
>>
>>    Wrapping opposite-direction phrases in HTML4
>>
>>    The dir attribute
>>
>>    In principle, the right thing to do for /every/ opposite-direction
>>
>>    phrase is to set its base direction by using the dir attribute on an
>>    element tightly wrapping the phrase. (By "tightly wrapping", we mean
>>    that the element contains the entire opposite-direction phrase, and
>>    nothing but the opposite-direction phrase.) When none of the cases
>>    above apply, this will not have any visible effect. But when one of
>>    them does apply, the dir attribute is the right solution.
>>
>>    We can see dir in action in the following example, which tries (in
>>    the LTR context of this page) to say "an introduction to C++" in
>>    Arabic, which should look like "C++ مدخل إلى":
>>
>>    ... C++: مدخل إلى C++
>>
>>    <span dir="rtl">... C++</span>: ++C مدخل إلى
>>
>>    <span dir="rtl">... <p dir="ltr">C++</p></p>: C++ مدخل إلى
>>
>>    The first attempt fails with the last word of the phrase, "C++",
>>    appearing in the wrong place. This is because our RTL phrase is of
>>    opposite direction to the (LTR) context, and contains a nested
>>    phrase of the original LTR direction ("C++") inside it. The bidi
>>    algorithm, of course, has no way of knowing that the "C++" is part
>>    of the RTL phrase, not of the LTR context, and thus displays it as
>>    the latter: to the right of the Arabic words instead of to their
>>    left. To fix this, we need to wrap the whole phrase in a <span
>> dir=rtl>.
>>
>>    That is our second attempt, and it still fails with the "C++" coming
>>    out as "++C" instead. This happens because the "C++" is an LTR
>>    phrase ending in neutral characters being displayed in the context
>>    of our RTL phrase. The bidi algorithm has no way of knowing that the
>>    plus signs are part of the LTR phrase, not of the RTL context, and
>>    thus displays them as part of the context: to the left of the "C"
>>    instead of to its right.
>>
>>    Our third attempt finally succeeds. It wraps the overall RTL phrase
>>    in a <span dir="rtl">, and the LTR phrase nested inside it in its
>>    own <span dir="ltr">.
>>
>>    LRM/RLM
>>
>>    In addition to the dir attribute, the visual order in which text is
>>    displayed can also be modified by using two invisible Unicode
>>    control characters: LRM (LEFT-TO-RIGHT-MARK, U+200E, &lrm; as a
>>    named entity), and RLM (RIGHT-TO-LEFT-MARK, U+200F, &rlm;). Each has
>>    the strong type indicated by its name, but is invisible, like an
>>    invisible A and an invisible א.
>>
>>    One use of LRM and RLM is to /extend/ a directional run through
>>
>>    neutral or weak characters at the start or end of an
>>    opposite-direction phrase, by putting a mark of the same direction
>>    as the phrase on the other side of the neutral or weak characters.
>>    For example, in our Arabic "Introduction to C++" example above,
>>    instead of wrapping the "C++" in a <span dir="ltr">, we could add an
>>    &lrm; after the second plus:
>>
>>    <span dir="rtl">... C++&lrm;</span>: C++ مدخل إلى
>>
>>    Being strongly LTR, the LRM extended the LTR run through the neutral
>>    pluses.
>>
>>    Used this way, however, LRM and RLM are a bit like gotos in
>>    programming languages: a quick hack that, unlike the dir attribute,
>>    says nothing about the structure of the text. And they simply cannot
>>    be used to deal with an opposite-direction phrase that happens to
>>    contain a nested phrase in the original direction, like our complete
>>    "Introduction to C++" example above. That may seem like an esoteric
>>    case, but it is surprisingly common when displaying RTL data in an
>>    LTR page, because the use of LTR words (like "C++") is not uncommon
>>    in RTL text. So, if you don't want to analyze whether LRM and RLM
>>    can replace the use of the dir attribute in /your/ case, just use
>>
>>    the dir attribute.
>>
>>    Nevertheless, it turns out that LRM and RLM do have an essential
>>    function dealing with opposite-direction phrases in HTML4:
>>    /separating /an opposite-direction phrase from a number or from a
>>
>>    separate opposite-direction phrase that happens to follow it, by
>>    putting between them a mark of the same direction as the /context/.
>>
>>    When used this way, LRM and RLM do not replace the use of the dir
>>    attribute, but augment it.
>>
>>    <example for number, e.g. the restaurant example>
>>    <example for two separate phrases, e.g. your use case 6>
>>
>>    Putting it all together in HTML4
>>
>>    To summarize, in HTML4, to make sure that an opposite-direction
>>    phrase is displayed correctly, up to two steps are necessary:
>>
>>     1. Tightly wrap the opposite-direction phrase in an element that
>>
>>        uses the dir attribute to set the direction of the phrase. This
>>        is not always necessary, but never does any harm.
>>
>>     2. If the opposite-direction phrase is followed (possibly after
>>
>>        some intervening neutral characters) by a number or a logically
>>        separate opposite-direction phrase, separate the two with a
>>        directional mark matching the direction of the context. If you
>>        do not want to check whether this is actually the case, you can
>>        add a directional mark matching the context's direction after
>>        every opposite-direction phrase.
>>
>>
>>    Wrapping opposite-direction phrases in HTML5
>>
>>    note! This section describes features that are being introduced by
>>    HTML5. At the time of writing, these features are not yet widely
>>    supported in browsers, but the expectation is that they will be
>>    supported soon. In the meantime, you should use these with extreme
>>    caution.
>>
>>    <bdi>
>>
>>    HTML5 introduces a new element, <bdi>, expressly for the purpose of
>>    wrapping opposite-direction phrases. It is just like a <span>,
>>    but directionally /isolates/ its content from the surrounding text;
>>
>>    bdi stands for "bi-directional isolate". The effect of the isolation
>>    is that you do not need to use LRM and RLM to separate an
>>    opposite-direction phrase wrapped in <bdi> from a number or a
>>    logically separate opposite-direction phrase that happens to follow
>>    it. Since it is actually quite rare /not/ to want to isolate
>>
>>    embedded phrases from its surroundings, <bdi> (when the browser
>>    supports it) should be used instead of a <span> for bidi-wrapping,
>>    while the use of LRMs and RLMs can be completely avoided.
>>
>>    Please note that <bdi> also comes with the dir attribute set to the
>>    new "auto" value by default (see below).
>>
>>    dir="auto"
>>
>>    HTML5 also addresses another need: text dropped into a page, say
>>    from a database, when you don't know its base direction. Before
>>    HTML5, you could only set the dir attribute to "ltr" or "rtl", and
>>    had to somehow determine which of them was appropriate yourself.
>>    HTML5 provides a new value for the dir attribute: "auto". The "auto"
>>    value tells the browser to look at the first strongly typed
>>    character in the element. If it's a right-to-left typed character
>>    such as a Hebrew or Arabic letter, the element will get a direction
>>    of "rtl". If it's, say, a Latin character, the direction will be "ltr".
>>
>>    There are corner cases where this may not give the desired outcome,
>>    but it should usually produce the desired result.
>>
>>    Note that the browser ignores any neutral or weak characters at the
>>    beginning of the text when looking for the first strong character.
>>    It also ignores anything inside a bdi element or an element with a
>>    dir tag of its own, including auto.
>>
>>
>>    Furthermore, dir=auto on any element also directionally isolates its
>>    element from its surroundings as if it were a <bdi>. Thus, if you
>>    already have an element like <a> or <cite> wrapping a phrase of
>>    unknown direction, all your bidi wrapping needs are accomplished by
>>    adding a dir="auto" on the existing element.
>>
>>    Not to be outdone, the bdi element behaves as if it has dir=auto by
>>    default (i.e. unless an explicit dir="ltr" or dir="rtl" is specified).
>>
>>    The choice of whether to attach dir="auto" on an existing element or
>>    to wrap the phrase in a <bdi> depends on whether you already have an
>>    element tightly wrapping the potentially opposite-direction phrase,
>>    and whether you happen to know the phrase's direction (or can guess
>>    at it better than the browser's dir="auto" logic).
>>
>>    dir="auto" on <textarea> and <pre>
>>
>>    When used on the <textarea> and <pre> elements, dir="auto" does its
>>    direction estimation for each paragraph of text in the element
>>    separately. If one paragraph starts with an RTL character, and
>>    another with an LTR character, the first will be displayed RTL, and
>>    the second in LTR. This follows the Unicode standard for plain text,
>>    in the elements usually used to enter and display plain text
>>    content. When displaying plain text it in a different element, e.g.
>>    a <div> with the "white-space" style set to "pre" or "pre-wrap", the
>>    same effect can be achieved by setting its "unicode-bidi" style to
>>    "plaintext". [I am not sure if the last sentence is appropriate in
>>    an article that mostly ignores CSS]
>>
>>    Putting it all together for HTML5
>>
>>    To summarize, in HTML5, to make sure that a phrase that may have the
>>    opposite direction is displayed correctly, do the following:
>>
>>     1. If you know the phrase's direction (or have a better way of
>>
>>        determining it than the method used by dir=auto), wrap the
>>        phrase in <bdi dir="ltr"> or <bdi dir="rtl">, as appropriate. Do
>>        this even if the phrase has the same direction as the context,
>>        just in case it happens to end in strongly typed characters of
>>        the opposite direction, and happens to be followed by a number
>>        or a separate opposite-direction phrase.
>>
>>     2. Otherwise, if the phrase is already tightly wrapped by an
>>
>>        element, add dir="auto" to the element.
>>
>>     3. Otherwise, wrap the phrase in <bdi>. Without an explicit dir
>>
>>        value, dir="auto" is implied.
>>
>> What would follow is an "Additional examples" section, for whatever
>> cases you think are most illustrative. Each would be entitled by a
>> simple name describing the example, e.g. "The MAC address", not by
>> complicated typology. Each would give the recommended solution in HTML4
>> and HTML5. If you feel it is necessary, point out the examples that can
>> be fixed by LRM/RLM alone, when discussing the HTML4 solution. Do not do
>> that for HTML5, and I am not sure it is worth doing at all.
>>
>> Aharon
>>
>> On Tue, Nov 15, 2011 <tel:2011> at 5:01 PM, Richard Ishida
>> <ishida@w3.org <mailto:ishida@w3.org>> wrote:
>>
>>    Hi Aharon, Mati,
>>
>>    I have just done a first draft pass over the document
>>    http://www.w3.org/__**International/tutorials/new-__**
>> bidi-xhtml/Overview-inline.en.**__php<http://www.w3.org/__International/tutorials/new-__bidi-xhtml/Overview-inline.en.__php>
>>
>>    <http://www.w3.org/**International/tutorials/new-**
>> bidi-xhtml/Overview-inline.en.**php<http://www.w3.org/International/tutorials/new-bidi-xhtml/Overview-inline.en.php>
>> >
>>    (in particular from here down:
>>    http://www.w3.org/__**International/tutorials/new-__**
>> bidi-xhtml/Overview-inline.en.**__php#where<http://www.w3.org/__International/tutorials/new-__bidi-xhtml/Overview-inline.en.__php#where>
>>    <http://www.w3.org/**International/tutorials/new-**
>> bidi-xhtml/Overview-inline.en.**php#where<http://www.w3.org/International/tutorials/new-bidi-xhtml/Overview-inline.en.php#where>
>> >).
>>
>>
>>    Bearing in mind that this is a first pass, would you mind scanning
>>    it and letting me know whether you think i'm on the right track?
>>      Hopefully it responds to the structure related comments below.
>>    (I'm still planning to revisit some of the more detailed comments.)
>>
>>    btw, I haven't yet decided what to do with the section entitled
>>    "More examples".
>>
>>    Thanks!
>>    RI
>>
>>
>>    PS: Any thoughts on this:
>>    https://plus.google.com/__**103190014606131822578/posts/__**
>> MgiwWfu3Rrt<https://plus.google.com/__103190014606131822578/posts/__MgiwWfu3Rrt>
>>
>>    <https://plus.google.com/**103190014606131822578/posts/**MgiwWfu3Rrt<https://plus.google.com/103190014606131822578/posts/MgiwWfu3Rrt>>
>> ?
>>
>>
>>
>>    On 28/10/2011 21:09, Richard Ishida wrote:
>>
>>        I have begun a substantial reorganization and rewrite of the
>>        following
>>        section:
>>
>>        http://www.w3.org/__**International/tutorials/new-__**
>> bidi-xhtml/Overview-inline.en.**__php#where<http://www.w3.org/__International/tutorials/new-__bidi-xhtml/Overview-inline.en.__php#where>
>>
>>        <http://www.w3.org/**International/tutorials/new-**
>> bidi-xhtml/Overview-inline.en.**php#where<http://www.w3.org/International/tutorials/new-bidi-xhtml/Overview-inline.en.php#where>
>> >
>>
>>
>>
>>        RI
>>
>>
>>
>>        On 25/10/2011 17:58, Richard Ishida wrote:
>>
>>            Hi Aharon, and thanks for your comments. I was hoping to
>>            discuss with
>>            you at the Unicode conf, but that wasn't to be, so here is a
>>            quick dash
>>            at my thoughts (since I have to go out soon).
>>
>>            I actually agree with pretty much everything you say, but
>>            the concern I
>>            had was to do with Martin's previous post about the fact
>>            that these
>>            things are not yet supported widely, and how to manage
>>            expectations in
>>            that regard.
>>
>>            Even where implementation is there (eg. for dir=auto on
>>            Chrome (although
>>            not <bdi> afaict!)) it will be some time before the new
>>            constructs can
>>            be relied upon on their own, due to legacy browser usage
>>            (esp. IE8).
>>
>>            My original thought was to 'cordon off' the new stuff into
>>            its own
>>            section with a big disclaimer, so that it is clear that this
>>            stuff
>>            doesn't work quite yet, and then merge it in to the
>>            mainstream gradually
>>            as support increases.
>>
>>            However, I think you might be right that we should integrate
>>            from the
>>            start. The challenge will be to do so in a way that makes it
>>            clear to
>>            the reader what currently works and what doesn't.
>>
>>            That said, I'm still a little worried about the legacy
>>            aspect of this.
>>
>>            I've seen a few places in my own pages where I'm inclined to
>> add
>>            dir=auto or bdi right now, but I know that i will still need
>>            to also use
>>            the rlm/lrm for at least a couple of years to cater for the IE8
>>            corporate legacy.
>>
>>            Using both will be messy, for explanation as well as for
>> content
>>            authoring.
>>
>>            I'm wondering whether a way around this is to use CSS. For
>>            example, in a
>>            LTR page or context, the CSS rule
>>
>>            bdi:before { content: '\200E '; }
>>
>>            will cause
>>
>>            <p>The names of these states in Arabic are <bdi>مصر</bdi>,
>>            <bdi>البحرين</bdi> and <bdi>الكويت</bdi> respectively.</p>
>>
>>            to display as expected, even if bdi is not supported.
>>
>>            I suspect we may need to distinguish between cases, such as
>>            input
>>            fields, where the rlm/lrm is not appropriate (because it
>>            doesn't help),
>>            and situations like the example above, where it can help
>>            (either for bdi
>>            or dir=auto).
>>
>>            Actually, the CSS should probably be genericised to say
>>            something like,
>>            if the direction of the parent element is RTL use rlm, and
>>            vice versa,
>>            but I think that that capability too is only now being
>>            introduced.
>>
>>            What do you think?
>>
>>            RI
>>
>>
>>
>>            On 14/10/2011 13:04, Aharon (Vladimir) Lanin wrote:
>>
>>                I think that the bdi element and the idea of isolation
>>                should appear
>>                much earlier in the article, long before unknown
>>                direction. Basically,
>>                when you introduce <span dir=...> in "A simple solution"
>>                (after "Nesting
>>                base direction"), you should also mention that HTML5
>>                defines a new
>>                element, <bdi>, that should be preferred over <span> for
>>                this purpose,
>>                once browsers start to support it, because it also
>>                isolates the nested
>>                phrase from its surroundings, thus preventing it
>>                influencing their
>>                display. You can say that there are examples coming up.
>>
>>                "Adjacent, same-direction directional runs that are
>>                incorrectly ordered"
>>                is an excellent example for the use of <bdi>. I think
>>                you should take
>>                out the sentence "Putting markup around the comma is a
>>                bit like cracking
>>                an egg with a hammer in this case." I think that mark-up
>>                generally is
>>                the preferred solution, when it states something that
>>                makes sense. As I
>>                will explain below, enclosing the comma in a <span
>>                dir=ltr> makes no
>>                sense, and should not even be mentioned, since it will
>>                not work. On the
>>                other hand, enclosing each of the RTL items in the list
>>                (but not the
>>                commas or spaces between them) in a <bdi dir=rtl> makes
>>                perfect sense,
>>                i.e.:
>>
>>                The names of these states in Arabic are <bdi
>>                dir="rtl">مصر</bdi>, <bdi
>>                dir="rtl">البحرين</bdi> and <bdi dir="rtl">الكويت</bdi>
>>                respectively.
>>
>>                You can say that in this example, the dir="rtl"s
>>                actually don't change
>>                anything, and in fact that just the first <bdi> is
>>                sufficient to fix the
>>                problem, but there is nothing wrong with marking every
>>                embedded
>>                opposite-direction phrase in a <bdi> - it won't hurt,
>>                and will often
>>                prevent problems.
>>
>>                As I said before, putting a <span dir=ltr> around the
>>                comma does not
>>                make sense, and should not be mentioned at all. Why
>>                specifically the
>>                comma, and not, say the space next to it? Furthermore, a
>>                <span dir=...>
>>                is an /embedding/ - which is not really true for the
>>                comma: it's a part
>>                of the enclosing LTR sentence, not a piece of LTR
>>                embedded within - i.e.
>>                a part of - some RTL. In fact, putting the <span
>>                dir=ltr> around the
>>                comma puts the comma in the wrong place when there is no
>>                space between
>>                it and the RTL text preceding it.
>>
>>                In "More examples", the Hebrew "W3C ... ERCIM" examples
>>                should really
>>                start with "ה-" immediately before the "W3C", i.e. the
>>                desired output
>>                should be:
>>                ה-W3C‏ (World Wide Web Consortium) מעביר את שירותי הארחה
>>                באירופה ל -
>>                ERCIM.
>>
>>                This too is actually a great place to use <bdi>:
>>
>>                ה-<bdi dir="ltr">W3C</bdi> (<bdi dir="ltr">World Wide Web
>>                Consortium</bdi>) מעביר את שירותי הארחה באירופה ל-<bdi
>>                dir="ltr">ERCIM</bdi>.
>>
>>                Once again, you don't actually need the dir="ltr" on any
>>                of these, and
>>                just the first or second <bdi> will be sufficient alone
>>                to fix the
>>                problem, but in principle the safe way to write this
>>                sentence is as
>>                above.
>>
>>                I think that the <bdi> solution - once it is available
>>                in browsers - is
>>                preferable to using &rlm;, because it makes intuitive
>>                sense. You simply
>>                mark the embedded opposite-direction phrases, each one
>>                on its own. Until
>>                someone actually understands the UBA - which very few
>>                people do - using
>>                LRM and RLM seems like voodoo. Few people know when they
>>                should use LRM
>>                and when they should use RLM, and where exactly they
>>                should put it.
>>
>>                IMO, the same applies to all the other examples in this
>>                section. The
>>                best way to deal with them, when it becomes available,
>>                is <bdi dir=ltr>
>>                (or just <bdi>, because of dir=auto, but we don't have
>>                to mention that
>>                yet), not an LRM, and not <span dir=ltr>.
>>
>>                In "Handling unknown text", if you are looking for a
>>                real RTL book title
>>                that contains some LTR word(s), but does not begin with
>>                them (so that
>>                dir=auto will work well with it), there is
>>                http://books.google.com/books?**__id=05syOwAACAAJ<http://books.google.com/books?__id=05syOwAACAAJ>
>>
>>                <http://books.google.com/**books?id=05syOwAACAAJ<http://books.google.com/books?id=05syOwAACAAJ>
>> >:
>>
>>
>>                מבוא לתכנות בסביבת אינטרנט - מבוא ו- HTML
>>
>>                Please note that the Google Books page has a bug: the
>>                title as displayed
>>                at the top of the page is always in the direction of the
>>                UI. However,
>>                the title displayed near the bottome of the page, after
>>                "Title:" is
>>                displayed using the word-count direction estimation
>>                algorithm. It gets
>>                this book title right.
>>
>>                Furthermore, please note that when I used Google Books'
>>                Advanced Search
>>                to look for Hebrew-language books containing one of the
>>                words HTML, CSS,
>>                and JavaScript, the majority of the book titles I found
>>                /began /with the
>>                LTR word, so dir=auto's first string algorithm does not
>>                work well on
>>                them. I had tried to push through word-count for
>>                dir=auto, but failed to
>>                convince people. Examples:
>>
>>                http://books.google.com/books?**__id=IU83OgAACAAJ<http://books.google.com/books?__id=IU83OgAACAAJ>
>>                <http://books.google.com/**books?id=IU83OgAACAAJ<http://books.google.com/books?id=IU83OgAACAAJ>
>> >
>>                http://books.google.com/books?**__id=_qAlOgAACAAJ<http://books.google.com/books?__id=_qAlOgAACAAJ>
>>                <http://books.google.com/**books?id=_qAlOgAACAAJ<http://books.google.com/books?id=_qAlOgAACAAJ>
>> >
>>                http://books.google.com/books?**__id=_-gSKQEACAAJ<http://books.google.com/books?__id=_-gSKQEACAAJ>
>>
>>                <http://books.google.com/**books?id=_-gSKQEACAAJ<http://books.google.com/books?id=_-gSKQEACAAJ>
>> >
>>
>>                For this reason, I think it is worthwhile to tone down
>>                the statement
>>                that "There are some rare corner cases where this may
>>                not give the
>>                desired outcome, but in the majority of cases it should
>>                produce the
>>                expected result." I would take out the words "some
>>                rare", and you could
>>                also add on "particularly when the embedded text does
>>                not mix LTR and
>>                RTL words and the problem is limited to things like
>> trailing
>>                punctuation, leading numbers, and phone numbers."
>>
>>                On Thu, Oct 13, 2011 <tel:2011> <tel:2011 <tel:2011>> at
>>
>>                8:09 PM, Richard Ishida
>>                <ishida@w3.org <mailto:ishida@w3.org>
>>                <mailto:ishida@w3.org <mailto:ishida@w3.org>>> wrote:
>>
>>                On 19/09/2011 16:04, [Mati] wrote:
>>
>>                http://www.w3.org/____**International/tutorials/new-__**
>> __bidi-xhtml/qa-html-dir.php<http://www.w3.org/____International/tutorials/new-____bidi-xhtml/qa-html-dir.php>
>>                <http://www.w3.org/__**International/tutorials/new-__**
>> bidi-xhtml/qa-html-dir.php<http://www.w3.org/__International/tutorials/new-__bidi-xhtml/qa-html-dir.php>
>> >
>>
>>
>>
>>                <http://www.w3.org/__**International/tutorials/new-__**
>> bidi-xhtml/qa-html-dir.php<http://www.w3.org/__International/tutorials/new-__bidi-xhtml/qa-html-dir.php>
>>                <http://www.w3.org/**International/tutorials/new-**
>> bidi-xhtml/qa-html-dir.php<http://www.w3.org/International/tutorials/new-bidi-xhtml/qa-html-dir.php>
>> >>
>>
>>
>>
>>                11) In section "Using dir="auto" with the input
>>                element", the first
>>
>>                 > Hebrew word of the example is not known to me and is
>>                probably a
>>                typo. I don't even guess what was the intended word.
>>
>>
>>                On 20/09/2011 09:38, [Mati] wrote:
>>
>>                http://www.w3.org/____**International/tutorials/new-__**
>> __bidi-xhtml/Overview-inline.**en.____php<http://www.w3.org/____International/tutorials/new-____bidi-xhtml/Overview-inline.en.____php>
>>                <http://www.w3.org/__**International/tutorials/new-__**
>> bidi-xhtml/Overview-inline.en.**__php<http://www.w3.org/__International/tutorials/new-__bidi-xhtml/Overview-inline.en.__php>
>> >
>>
>>
>>
>>                <http://www.w3.org/__**International/tutorials/new-__**
>> bidi-xhtml/Overview-inline.en.**__php<http://www.w3.org/__International/tutorials/new-__bidi-xhtml/Overview-inline.en.__php>
>>                <http://www.w3.org/**International/tutorials/new-**
>> bidi-xhtml/Overview-inline.en.**php<http://www.w3.org/International/tutorials/new-bidi-xhtml/Overview-inline.en.php>
>> >>
>>
>>
>>
>>                DON'T show email on public list.
>>
>>                Name: Matitiahu Allouche
>>                Email:matial@il.ibm.com
>>                <mailto:Email%3Amatial@il.ibm.**com<Email%253Amatial@il.ibm.com>
>> >
>>                <mailto:Email%3Amatial@il.ibm.**__com
>>                <mailto:Email%253Amatial@il.**ibm.com<Email%25253Amatial@il.ibm.com>
>> >>
>>
>>
>>
>>                Comments:
>>                This is the continuation of comments that I sent in a
>>                previous
>>                submission.
>>
>>                18) In section "Second use case", the first Hebrew word
>>                of the
>>                book title differs between its mention in the body of
>>                the text
>>                and its mention in the message. The form in the message
>>                is the
>>                correct one.
>>
>>
>>
>>                I think I was trying to use the title of the article at
>>                http://www.w3.org/____**International/questions/qa-___**
>> _css-charset.he.php<http://www.w3.org/____International/questions/qa-____css-charset.he.php>
>>                <http://www.w3.org/__**International/questions/qa-__**
>> css-charset.he.php<http://www.w3.org/__International/questions/qa-__css-charset.he.php>
>> >
>>
>>                <http://www.w3.org/__**International/questions/qa-__**
>> css-charset.he.php<http://www.w3.org/__International/questions/qa-__css-charset.he.php>
>>                <http://www.w3.org/**International/questions/qa-**
>> css-charset.he.php<http://www.w3.org/International/questions/qa-css-charset.he.php>
>> >>
>>                (though why that's different, I'm not sure). But at the
>>                time I only
>>                grabbed that quickly because i was in a hurry.
>>
>>                Would you or Aharon be able to provide me with a real
>>                book title
>>                that has similar properties? (ie. ending with CSS or
>>                some such).
>>                (Maybe one of these?
>>                http://www.google.com/search?_**
>> ___q=CSS3&btnG=Search+Books&**tbm=____bks&tbo=1<http://www.google.com/search?____q=CSS3&btnG=Search+Books&tbm=____bks&tbo=1>
>>                <http://www.google.com/search?**
>> __q=CSS3&btnG=Search+Books&**tbm=__bks&tbo=1<http://www.google.com/search?__q=CSS3&btnG=Search+Books&tbm=__bks&tbo=1>
>> >
>>
>>                <http://www.google.com/search?**
>> __q=CSS3&btnG=Search+Books&**tbm=__bks&tbo=1<http://www.google.com/search?__q=CSS3&btnG=Search+Books&tbm=__bks&tbo=1>
>>                <http://www.google.com/search?**
>> q=CSS3&btnG=Search+Books&tbm=**bks&tbo=1<http://www.google.com/search?q=CSS3&btnG=Search+Books&tbm=bks&tbo=1>
>> >>)
>>
>>                Cheers,
>>
>>                RI
>>
>>
>>
>>
>>
>>                --
>>                Richard Ishida
>>                Internationalization Activity Lead
>>                W3C (World Wide Web Consortium)
>>
>>                http://www.w3.org/____**International/<http://www.w3.org/____International/>
>>                <http://www.w3.org/__**International/<http://www.w3.org/__International/>
>> >
>>
>>                <http://www.w3.org/__**International/<http://www.w3.org/__International/>
>>                <http://www.w3.org/**International/<http://www.w3.org/International/>
>> >>
>>                http://rishida.net/
>>
>>
>>
>>
>>
>>    --
>>    Richard Ishida
>>    Internationalization Activity Lead
>>    W3C (World Wide Web Consortium)
>>
>>    http://www.w3.org/__**International/<http://www.w3.org/__International/><
>> http://www.w3.org/**International/ <http://www.w3.org/International/>>
>>    http://rishida.net/
>>
>>
>>
> --
> Richard Ishida
> Internationalization Activity Lead
> W3C (World Wide Web Consortium)
>
> http://www.w3.org/**International/ <http://www.w3.org/International/>
> http://rishida.net/
>
Received on Tuesday, 7 February 2012 13:25:04 UTC