Re: Rework of Bidi inline article

Hi Richard,

My response was not stream-of-consciousness, but carefully worked out
suggested changes, in the order in which they appear in the document.
However, I forgot to put in an explanation of what I was trying to do
before going to the specific changes. Here is that explanation:

After I re-read the article and tried to understand what was unclear in the
bit that you marked as "ED: THE NEXT BIT ISN'T CLEAR", I realized that the
article glossed over three important issues.

I'll start with the simplest one (although it has nothing to do with the
bit that wasn't clear). In several locations, the article says that when an
opposite-direction phrase is already tightly wrapped in some element, the
bidi can be handled by putting a dir attribute on that element (and, in
HTML4, inserting an LRM/RLM after it). This overlooks a very common
possibility, that the element in question has non-inline display (e.g. a
<p>, <div>, or <td>). When this is the case, setting the dir attribute on
that element usually also changes its alignment, which may or may not be
desirable. If it is desirable, that's fine, but it should be pointed out
that the LRM/RLM is no longer necessary (since nothing can follow the
phrase *inline*, given that there is a paragraph break at the end of the
non-inline element). If it is undesirable, the easiest way to deal with it
is to ignore the existing element and wrap the phrase in a new span or bdi.

The second issue is that the article only discusses how to handle the
scenario where (unpredictable) text data is dropped into the page at
runtime in one context - HTML5. People today have to deal with it in the
context of HTML4, and since IE9 is going to be around for years to come,
will have to continue to deal with it in HTML4 for years to come. Thus, the
article has to give some guidance on it in the HTML4 section, which is what
the new "Handling text of unknown direction" subsection is about.

The third issue arises only in the "data dropped into the page" scenario.
In this scenario, what constitutes a "phrase" is not up to the application.
It is a text datum, e.g. the name of a restaurant as read from a database,
not more (e.g. a phrase synthesized by the application, such as "reviews of
<a restaurant name"), and it is not less (e.g. the second word in the
restaurant name). As a result, we can not limit the problem to
"opposite-direction phrases". That a phrase, as a whole, is of the same
direction as the context does not mean that it does not contain nested
phrases of the opposite direction, e.g. the RTL restaurant name "PIZZA
pizza" in an RTL page. We can not say that the second word is an
opposite-direction phrase that we want to handle using our algorithm (as
<span dir=ltr>pizza</span>&rlm; or as <bdi>pizza</bdi> or whatever). We are
not about to do surgery on the insides of the phrase we got from the
database, "PIZZA pizza". The phrase is "PIZZA pizza", and it is *not*
opposite-direction.
But that does not mean that we will not have a problem displaying it: when
followed by ": 3 REVIEWS", it will be displayed as "SWEIVER pizza: 3
AZZIP", which is a big problem. We handle this by no longer only dealing
with "opposite-direction phrases" but with "phrases that contains any
opposite-direction characters"; surrounding the whole "PIZZA pizza" in a
<span dir=rtl>, even though the context is already RTL, takes care of the
problem.

> > > And, as explained
> > > above, it is important to do this even if that direction is the same
as
> > > the context's, if the phrase might contain any opposite-direction
> > > characters.

> > I still don't understand what you mean by this. Can you give me an
example?


> I'm guessing that you are referring here to the case where the phrase's
> direction is unknown, and the text inserted is likely to be often in the
same
> direction as the surrounding content.  If so, I propose to add a para
just after
> point 2 to say that.

This is the "PIZZA pizza" case above. The key thing about it is *not *that
the phrase's direction is unknown (let's assume the database explicitly
tells me that it is RTL, i.e. should be displayed as "pizza AZZIP", not as
"AZZIP pizza", where the two words don't even want to look at one another
:-) but that the identity of the phrase is not up to the application, and
that a phrase that contains any opposite-direction characters has to be
given special handling.

Now, I would like to quote once again the HTML4 and HTML5 "putting it all
together" instructions and hear from you what it is that is still unclear
or problematic:

Putting it all together in HTML4

To summarize, in HTML4, to make sure that a phrase that contains any
opposite-direction characters is displayed correctly, up to two steps are
necessary:

1. Tightly wrap the phrase in an element that uses the dir attribute to set
the direction of the phrase. This is not always necessary, but never does
any harm. If the phrase is already tightly wrapped in an element, you can
use the existing element for this purpose. However, if the existing element
has non-inline display (e.g. a <p> or <div> element), setting its dir
attribute may also set its alignment. If this happens to be undesirable,
wrap the phrase in a  <span> inside the non-inline element and set the dir
attribute on the span.
2. If the phrase is of the opposite direction to the context and is
followed inline (possibly after some intervening neutral characters) by a
number or a logically separate opposite-direction phrase, separate the two
with a directional mark matching the direction of the context. If you do
not want to check whether this is actually the case, you can add a
directional mark matching the context's direction after every
opposite-direction phrase. You never have to do this when element on which
you set the dir attribute has non-inline display (since nothing follows it
inline).

Putting it all together in HTML5:

To summarize, in HTML5, to make sure that a phrase that contains any
opposite-direction characters is displayed correctly, do the following:

1. If the phrase is already tightly wrapped in an element that has
non-inline display (e.g. a <p> or <div> element), and setting this
element's alignment to the "start" of the phrase's direction happens to be
desirable, set the existing element's dir attribute. If you do not know the
phrase's direction, set the dir attribute to "auto". If the phrase is known
to have the same direction as the context, it is not necessary to set it.
2. Otherwise, if you know the phrase's overall direction (or have a better
way of determining it than the method used by dir="auto"), wrap the phrase
in <bdi dir="ltr"> or <bdi dir="rtl">, as appropriate. Since the phrase
contains some opposite-direction characters, do this even when the phrase's
overall direction is the same as the context.
3. (Optional) Otherwise, if the phrase is already tightly wrapped by an
element with inline display, add dir="auto" to the element.
4. Otherwise, wrap the phrase in <bdi>. Without an explicit dir value,
dir="auto" is implied.

Actually, I am think of getting rid of step 3 in the HTML5 version (the one
marked Optional). <cite dir="auto">foo</cite> is not significantly shorter
than
<cite><bdi>foo</bdi></cite>, and has no other advantages.

Aharon

On Wed, Feb 8, 2012 at 12:29 PM, Richard Ishida <ishida@w3.org> wrote:

> After reading another part of your comments, I think I can better guess
> the answer to some of the questions below...
>
>
> On 08/02/2012 09:10, Richard Ishida wrote:
>
>> Aharon,
>>
>> The key area where I'm trying to ensure clarity is the section "Putting
>> it all together in HTML5". I'm not sure whether your comments around
>> that section are stream-of-consciousness notes written as you work
>> through the text, or a stand-back commentary on all the alternatives
>> proposed. I suspect the former. I included an intact section of your
>> last email at the bottom of this email. Immediately below I'll address
>> certain points slightly out of the original order.
>>
>> Given
>>
>>  > The way you have it for the third algorithm is pretty close to what I
>>  > want.
>>
>> and
>>
>>  > I do not want
>>  > an application to use dir=auto (or <bdi> without dir) if the true
>>  > direction is available to it. Thus, I want an application to wrap the
>>  > content in <bdi dir=...> if it knows the direction.
>>
>> I have the following new version of the algorithm:
>>
>> =================
>> 1. If you know the phrase's direction, then
>> (a) if the phrase is not wrapped by another element, wrap the phrase in
>> <bdi dir="ltr"> or <bdi dir="rtl">. (If the phrase is wrapped by a span
>> element, replace the span with this).
>> (b) if the phrase is already tightly wrapped by an element (other than
>> span), add dir="auto" to that element
>> (c) in the rare cases where step (b) fails because the first letter of
>> the phrase is not strongly typed the right way, put an extra bdi element
>> around the phrase with dir="rtl" or dir="ltr" as appropriate (is it
>> worth mentioning this?)
>>
>> 2. If you don't know the phrase's direction, ie. the text will be
>> inserted at run time, then
>> (a) if the phrase is not wrapped by another element, wrap the phrase in
>> bdi. (Without an explicit dir value, dir="auto" is implied.)
>> (b) if the phrase is already tightly wrapped by an element, add
>> dir="auto" to the element.
>> ==================
>>
>>
>>  > And, as explained
>>  > above, it is important to do this even if that direction is the same as
>>  > the context's, if the phrase might contain any opposite-direction
>>  > characters.
>>
>> I still don't understand what you mean by this. Can you give me an
>> example?
>>
>
>
> I'm guessing that you are referring here to the case where the phrase's
> direction is unknown, and the text inserted is likely to be often in the
> same direction as the surrounding content.  If so, I propose to add a para
> just after point 2 to say that.
>
>
>
>
>> Wrt the above algorithm:
>>  > The differences is that I would like you to add a case where the
>>  > application happens to know the phrase's direction, as a separate
>> datum.
>>
>> Please expand.
>>
>
> I think you mean that you want a point 3 which deals with the case where
> an application has a way of deducing the direction of the inserted text.
>  There are a number of other suggestions earlier in your note that refer to
> such as case.
>
> I think that this is an unusual case, and I'm currently inclined to have a
> separate section that addresses this scenario, rather than further
> complicate the normal scenario with extra points.
>
>
> RI
>
>
>
>>
>> Cheers,
>> RI
>>
>>
>> On 07/02/2012 13:20, Aharon (Vladimir) Lanin wrote:
>>
>>> Putting it all together in HTML5:
>>>
>>> Regarding your alternatives, I do not want the page (or application)
>>> author to have to figure out if the phrase could be followed inline by
>>> something problematic. And I do not want an application author to have
>>> to figure out whether dir=auto will result in the direction that the
>>> application knows is the right direction. In other words, I do not want
>>> an application to use dir=auto (or <bdi> without dir) if the true
>>> direction is available to it. Thus, I want an application to wrap the
>>> content in <bdi dir=...> if it knows the direction. And, as explained
>>> above, it is important to do this even if that direction is the same as
>>> the context's, if the phrase might contain any opposite-direction
>>> characters.
>>>
>>> Here is an updated text for this section (hopefully, with the
>>> problematic parts clarified):
>>> --- start ---
>>> To summarize, in HTML5, to make sure that a phrase that contains (or, in
>>> an application, might contain) any opposite-direction characters is
>>> displayed correctly, do the following:
>>>
>>> 1. If the phrase is already tightly wrapped in an element that has
>>> non-inline display (e.g. a <p> or <div> element), and setting this
>>> element's alignment to the "start" of the phrase's direction happens to
>>> be desirable, set the existing element's dir attribute. If you do not
>>> know the phrase's direction, set the dir attribute to "auto". If the
>>> phrase is known to have the same direction as the context, it is not
>>> necessary to set it.
>>> 2. Otherwise, if you know the phrase's overall direction (or have a
>>> better way of determining it than the method used by dir="auto"), wrap
>>> the phrase in <bdi dir="ltr"> or <bdi dir="rtl">, as appropriate. Since
>>> the phrase contains (or might contain) some opposite-direction
>>> characters, do this even when the phrase's overall direction is the same
>>> as the context.
>>> 3. (Optional) Otherwise, if the phrase is already tightly wrapped by an
>>> element with inline display, add dir="auto" to the element.
>>> 4. Otherwise, wrap the phrase in <bdi>. Without an explicit dir value,
>>> dir="auto" is implied.
>>> --- end ---
>>>
>>>
>>  Using bdi and dir="auto" for our use cases:
>>> The way you have it for the third algorithm is pretty close to what I
>>> want. The differences is that I would like you to add a case where the
>>> application happens to know the phrase's direction, as a separate datum.
>>>
>>
>>
> --
> Richard Ishida
> Internationalization Activity Lead
> W3C (World Wide Web Consortium)
>
> http://www.w3.org/**International/ <http://www.w3.org/International/>
> http://rishida.net/
>
>

Received on Wednesday, 8 February 2012 13:19:44 UTC