W3C home > Mailing lists > Public > public-i18n-core@w3.org > October to December 2011

Re: Rework of Bidi inline article

From: Matitiahu Allouche <matial@il.ibm.com>
Date: Tue, 22 Nov 2011 12:31:14 +0200
To: Richard Ishida <ishida@w3.org>
Cc: "Aharon (Vladimir) Lanin" <aharon@google.com>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-ID: <OF3879024C.871FB3FB-ONC2257950.0027FA2B-C2257950.0039CF66@il.ibm.com>
   Hello, Richard!

In general, I find the article useful and understandable. The distinction 
between what is supported in HTML 4 and what in HTML 5 is very clear and 
should not be a problem.

I was somewhat annoyed by repetitions, for instance about the usage of 
LRM/RLM, but this is minor, and maybe it is on purpose, to make sure the 
point is taken.

I have a few comments on the contents.

1) In section "Numbers", I find: "Because it is weakly typed, the number 
is seen as part of the Arabic text, so the two Arabic words that surround 
the number are treated as part of the same directional run"
Strictly speaking, this is not true. The number is a separate LTR run and 
the Arabic words on each sides are 2 different RTL runs, so there are 4 
runs in this example.
Imagine the logical string "ABC 123def" in RTL base direction. It will be 
displayed as "123def CBA". Do you still think that the number is seen as 
part of the Arabic text?
I understand that the number is "embedded" in the Arabic text, so you see 
it as "part" of the Arabic text, but then you should introduce the concept 
of embedding levels, and not blur the concept of directional runs.

2) In use case 2, section "Using HTML4 markup", the wrong solution is 
showed as:
<p>W3C employee, Richard Ishida (<span dir="rtl" lang="he">ריצ'רד 
אישידה</span>),&#x200E; 35, of Tel Aviv said yesterday that ...</p>
The code uses &#x200E, which is a possible solution and is not announced 
in the explanations.

3) In section "Using RLM/LRM", we find "For use case 3, put an RLM 
character between the Hebrew text and the start of the Mac address."
I doubt it (what would the RLM do that the Hebrew letters did not?). 
Please verify.

4) In "Use case 4", we find "Note how the text 'W3C' appears to the right 
of the Hebrew text,"
It seems that "right" should be "left".

5) In "Use case 5", we find "In the previous section the neutral character 
thought it was part of the directional context established by the base 
direction, but wasn't; in this section the neutral character thinks it is 
part of the directional run, when it is really part of the overall 
I am not keen about personalizing the characters and letting them "think". 
Since this article is for the naïve user of bidi, I prefer to make clear 
that the considerations are those of the presentation system of the 
software. It may be a matter of taste.

6) Editorial: "it's" => "its" at one location;
"repace" => "replace"

Shalom (Regards),  Mati
       Bidi Architect
       Globalization Center Of Competency - Bidirectional Scripts
       IBM Israel
       Mobile: +972 52 2554160

From:   Richard Ishida <ishida@w3.org>
To:     "Aharon (Vladimir) Lanin" <aharon@google.com>, Matitiahu 
Cc:     "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Date:   15/11/2011 17:01
Subject:        Rework of Bidi inline article

Hi Aharon, Mati,

I have just done a first draft pass over the document 

(in particular from here down: 


Bearing in mind that this is a first pass, would you mind scanning it 
and letting me know whether you think i'm on the right track?  Hopefully 
it responds to the structure related comments below. (I'm still planning 
to revisit some of the more detailed comments.)

btw, I haven't yet decided what to do with the section entitled "More 


PS: Any thoughts on this: 
https://plus.google.com/103190014606131822578/posts/MgiwWfu3Rrt ?

On 28/10/2011 21:09, Richard Ishida wrote:
> I have begun a substantial reorganization and rewrite of the following
> section:

> RI
> On 25/10/2011 17:58, Richard Ishida wrote:
>> Hi Aharon, and thanks for your comments. I was hoping to discuss with
>> you at the Unicode conf, but that wasn't to be, so here is a quick dash
>> at my thoughts (since I have to go out soon).
>> I actually agree with pretty much everything you say, but the concern I
>> had was to do with Martin's previous post about the fact that these
>> things are not yet supported widely, and how to manage expectations in
>> that regard.
>> Even where implementation is there (eg. for dir=auto on Chrome 
>> not <bdi> afaict!)) it will be some time before the new constructs can
>> be relied upon on their own, due to legacy browser usage (esp. IE8).
>> My original thought was to 'cordon off' the new stuff into its own
>> section with a big disclaimer, so that it is clear that this stuff
>> doesn't work quite yet, and then merge it in to the mainstream 
>> as support increases.
>> However, I think you might be right that we should integrate from the
>> start. The challenge will be to do so in a way that makes it clear to
>> the reader what currently works and what doesn't.
>> That said, I'm still a little worried about the legacy aspect of this.
>> I've seen a few places in my own pages where I'm inclined to add
>> dir=auto or bdi right now, but I know that i will still need to also 
>> the rlm/lrm for at least a couple of years to cater for the IE8
>> corporate legacy.
>> Using both will be messy, for explanation as well as for content
>> authoring.
>> I'm wondering whether a way around this is to use CSS. For example, in 
>> LTR page or context, the CSS rule
>> bdi:before { content: '\200E '; }
>> will cause
>> <p>The names of these states in Arabic are <bdi>مصر</bdi>,
>> <bdi>البحرين</bdi> and <bdi>الكويت</bdi> respectively.</p>
>> to display as expected, even if bdi is not supported.
>> I suspect we may need to distinguish between cases, such as input
>> fields, where the rlm/lrm is not appropriate (because it doesn't help),
>> and situations like the example above, where it can help (either for 
>> or dir=auto).
>> Actually, the CSS should probably be genericised to say something like,
>> if the direction of the parent element is RTL use rlm, and vice versa,
>> but I think that that capability too is only now being introduced.
>> What do you think?
>> RI
>> On 14/10/2011 13:04, Aharon (Vladimir) Lanin wrote:
>>> I think that the bdi element and the idea of isolation should appear
>>> much earlier in the article, long before unknown direction. Basically,
>>> when you introduce <span dir=...> in "A simple solution" (after 
>>> base direction"), you should also mention that HTML5 defines a new
>>> element, <bdi>, that should be preferred over <span> for this purpose,
>>> once browsers start to support it, because it also isolates the nested
>>> phrase from its surroundings, thus preventing it influencing their
>>> display. You can say that there are examples coming up.
>>> "Adjacent, same-direction directional runs that are incorrectly 
>>> is an excellent example for the use of <bdi>. I think you should take
>>> out the sentence "Putting markup around the comma is a bit like 
>>> an egg with a hammer in this case." I think that mark-up generally is
>>> the preferred solution, when it states something that makes sense. As 
>>> will explain below, enclosing the comma in a <span dir=ltr> makes no
>>> sense, and should not even be mentioned, since it will not work. On 
>>> other hand, enclosing each of the RTL items in the list (but not the
>>> commas or spaces between them) in a <bdi dir=rtl> makes perfect sense,
>>> i.e.:
>>> The names of these states in Arabic are <bdi dir="rtl">مصر</bdi>, <bdi
>>> dir="rtl">البحرين</bdi> and <bdi dir="rtl">الكويت</bdi> respectively.
>>> You can say that in this example, the dir="rtl"s actually don't change
>>> anything, and in fact that just the first <bdi> is sufficient to fix 
>>> problem, but there is nothing wrong with marking every embedded
>>> opposite-direction phrase in a <bdi> - it won't hurt, and will often
>>> prevent problems.
>>> As I said before, putting a <span dir=ltr> around the comma does not
>>> make sense, and should not be mentioned at all. Why specifically the
>>> comma, and not, say the space next to it? Furthermore, a <span 
>>> is an /embedding/ - which is not really true for the comma: it's a 
>>> of the enclosing LTR sentence, not a piece of LTR embedded within - 
>>> a part of - some RTL. In fact, putting the <span dir=ltr> around the
>>> comma puts the comma in the wrong place when there is no space between
>>> it and the RTL text preceding it.
>>> In "More examples", the Hebrew "W3C ... ERCIM" examples should really
>>> start with "ה-" immediately before the "W3C", i.e. the desired output
>>> should be:
>>> ה-W3C‏ (World Wide Web Consortium) מעביר את שירותי הארחה באירופה ל -
>>> ERCIM.
>>> This too is actually a great place to use <bdi>:
>>> ה-<bdi dir="ltr">W3C</bdi> (<bdi dir="ltr">World Wide Web
>>> Consortium</bdi>) מעביר את שירותי הארחה באירופה ל-<bdi
>>> dir="ltr">ERCIM</bdi>.
>>> Once again, you don't actually need the dir="ltr" on any of these, and
>>> just the first or second <bdi> will be sufficient alone to fix the
>>> problem, but in principle the safe way to write this sentence is as
>>> above.
>>> I think that the <bdi> solution - once it is available in browsers - 
>>> preferable to using &rlm;, because it makes intuitive sense. You 
>>> mark the embedded opposite-direction phrases, each one on its own. 
>>> someone actually understands the UBA - which very few people do - 
>>> LRM and RLM seems like voodoo. Few people know when they should use 
>>> and when they should use RLM, and where exactly they should put it.
>>> IMO, the same applies to all the other examples in this section. The
>>> best way to deal with them, when it becomes available, is <bdi 
>>> (or just <bdi>, because of dir=auto, but we don't have to mention that
>>> yet), not an LRM, and not <span dir=ltr>.
>>> In "Handling unknown text", if you are looking for a real RTL book 
>>> that contains some LTR word(s), but does not begin with them (so that
>>> dir=auto will work well with it), there is
>>> http://books.google.com/books?id=05syOwAACAAJ:

>>> מבוא לתכנות בסביבת אינטרנט - מבוא ו- HTML
>>> Please note that the Google Books page has a bug: the title as 
>>> at the top of the page is always in the direction of the UI. However,
>>> the title displayed near the bottome of the page, after "Title:" is
>>> displayed using the word-count direction estimation algorithm. It gets
>>> this book title right.
>>> Furthermore, please note that when I used Google Books' Advanced 
>>> to look for Hebrew-language books containing one of the words HTML, 
>>> and JavaScript, the majority of the book titles I found /began /with 
>>> LTR word, so dir=auto's first string algorithm does not work well on
>>> them. I had tried to push through word-count for dir=auto, but failed 
>>> convince people. Examples:
>>> http://books.google.com/books?id=IU83OgAACAAJ

>>> http://books.google.com/books?id=_qAlOgAACAAJ

>>> http://books.google.com/books?id=_-gSKQEACAAJ

>>> For this reason, I think it is worthwhile to tone down the statement
>>> that "There are some rare corner cases where this may not give the
>>> desired outcome, but in the majority of cases it should produce the
>>> expected result." I would take out the words "some rare", and you 
>>> also add on "particularly when the embedded text does not mix LTR and
>>> RTL words and the problem is limited to things like trailing
>>> punctuation, leading numbers, and phone numbers."
>>> On Thu, Oct 13, 2011 <tel:2011> at 8:09 PM, Richard Ishida
>>> <ishida@w3.org <mailto:ishida@w3.org>> wrote:
>>> On 19/09/2011 16:04, [Mati] wrote:

>>> <
>>> 11) In section "Using dir="auto" with the input element", the first
>>> > Hebrew word of the example is not known to me and is probably a
>>> typo. I don't even guess what was the intended word.
>>> On 20/09/2011 09:38, [Mati] wrote:

>>> <

>>> DON'T show email on public list.
>>> Name: Matitiahu Allouche
>>> Email:matial@il.ibm.com <mailto:Email%3Amatial@il.ibm.com>
>>> Comments:
>>> This is the continuation of comments that I sent in a previous
>>> submission.
>>> 18) In section "Second use case", the first Hebrew word of the
>>> book title differs between its mention in the body of the text
>>> and its mention in the message. The form in the message is the
>>> correct one.
>>> I think I was trying to use the title of the article at
>>> http://www.w3.org/__International/questions/qa-__css-charset.he.php

>>> <http://www.w3.org/International/questions/qa-css-charset.he.php>
>>> (though why that's different, I'm not sure). But at the time I only
>>> grabbed that quickly because i was in a hurry.
>>> Would you or Aharon be able to provide me with a real book title
>>> that has similar properties? (ie. ending with CSS or some such).
>>> (Maybe one of these?

>>> <http://www.google.com/search?q=CSS3&btnG=Search+Books&tbm=bks&tbo=1>)
>>> Cheers,
>>> RI
>>> --
>>> Richard Ishida
>>> Internationalization Activity Lead
>>> W3C (World Wide Web Consortium)
>>> http://www.w3.org/__International/ <http://www.w3.org/International/>
>>> http://rishida.net/


Richard Ishida
Internationalization Activity Lead
W3C (World Wide Web Consortium)



(image/png attachment: 01-part)

Received on Tuesday, 22 November 2011 11:45:34 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:23:07 UTC