W3C home > Mailing lists > Public > public-i18n-its@w3.org > July to September 2007

Re: Styling of embedded right-to-left text in source code visualization

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Mon, 10 Sep 2007 17:49:14 +0900
Message-Id: <>
To: Najib Tounsi <ntounsi@emi.ac.ma>, Felix Sasaki <fsasaki@w3.org>
Cc: Richard Ishida <ishida@w3.org>, public-i18n-its@w3.org

Hello Felix, Najib, others,

For some work on this issue, please also see my IUC28 paper at
and the simulation page at
and some additional info at

At 04:17 07/09/08, Najib Tounsi wrote:
>Hi Felix,
>Felix Sasaki wrote:
>> Hi Najib and Richard,
>> at least with Najib we discussed styling of "right-to-left" text in source code visualization a while ago, 
>I've noted that rendering bidi text in source code is editor (or tool) dependent and have suggested that a user should set his/her preference: Override Yes or Not the bidi algorithm, so that, if Yes, punctuations in the markup  and  normal text can't interfere and give unexpected rendering.

My guess is that almost always, in the above sense, the user would
choose "override yes". Just blindly applying the Unicode bidi algorithm
to something it's not designed to handle virtually always results in
chaos that shows as garbage.

But that's not the main problem. Even when we agree that the Unicode
Bidi algorithm as such isn't suited for source display (be that HTML/
XML or some programming language source or something else), there
are many ways to do a better job, and different users may prefer
different ways depending on their background and on the documents
at hand.

>Richard had already discussed this problem,
> http://www.w3.org/International/geo/html-tech/tech-bidi.html#d2e277
>but there is no satisfactory solution yet.
>Editing source code is not a usuall activity so, between the next three lines
><p title="ATTRIBUTE">CONTENT</p> (normal styling)
><p title="TNETNOC<"ETUBIRTTA</p> (some browser styling)
><p title="ETUBIRTTA">TNETNOC</p> (memory order)
>I prefer the third which is much sure for inserting a space for example.

Assuming the usual "upper case is RTL" convention, I fully
understand why you prefer the third, but I don't understand
why you label it as "memory order". The first is in memory
order, isn't it? The third is what we would like to see, but
what as far as I understand, no editor currently does.

>> but now we are not sure what to do about this example:
>> 1) http://www.w3.org/International/its/techniques/its-techniques.html#AuthDir (example 33, W3C on the right of the Hebrew text)

I see W3C on the left, not on the right, here. That seems okay.
I'm not sure why you see this differently.
An example with W3C (or something similar) in the middle would
even be better, except that then the reader has to look at
(and hopefully even understand) the Hebrew text on both sides.

Just looking at Best Practice 16, I see that it says:
"By default the text directionality in an XML document is assumed to be left-to-right."
Where did you get that? Wouldn't it be possible for some spec to
say that the default for them is RTL?

For "Why do this", please add that the list of languages that may
be written RTL is actually quite long.

"has values that indicate that the normal directionality should be overridden"
is a bit difficult to understand, or easy to misunderstand. I'd add
"in addition to values to indicate the base directionality".

>> versus this example:
>> 2) http://www.w3.org/TR/its/#directionality-implementation (example 33, "W3C" on the left of the Hebrew text)
>> we thought that the ITS 1.0 spec would be right, but have discovered now the visualization at
>> 3) http://www.w3.org/International/questions/qa-bidi-controls (example below "The HTML4 standard introduced markup to produce exactly the same effects as these Unicode characters.". "W3C" is on the right of the Hebrew text)

That also has W3C on the left, and is also okay.

>> So we need to discuss again what the appropriate visualization for "right-to-left" source code is. 1) and 3) visualize as if the outcome of the BIDI algorithm is overridden via the "dir" attribute, 2) is the other way round.

I still don't see the differences between the three examples, but I think
I know now what you mean. Your question is whether it's okay to use something
like the <span dir="rtl"> below (Hebrew is ???? due to my mailer) to tweak
source code display or not.

<span class="attribute">its:dir</span>="rtl"&gt;</span>
<span dir="rtl">?????? ???????, W3C</span>
<span class="element">&lt;/quote</span>&gt;

(btw, the span coverage for the closing element is really a bit strange)

>In my opinion, assume that editors are doing correct bidi-rendering, i.e. W3C is on the Left of the text, and, may be, add the same note as Richard (On the top right of http://www.w3.org/International/questions/qa-bidi-controls):
>"Note also that the examples of source text assume a sophisticated editor that resolves directionality of the source text correctly. This is to ensure that you understand the concepts being described. Many editors are not yet this sophisticated."

I agree that having such a note is a good thing. "Many" above is an understatement
as far as I understand, but it's probably difficult to be more explicit
(e.g.: "We currently don't know of any editor that is that sophisticated.").
Our simulation script isn't an editor, and so doesn't count here.

The problem with that Note is that it doesn't apply to the text immediately
on the right (which is final rendering, not source code), and where it
would apply (the box that says "Using XHTML, the earlier example would be coded as:",
it actually DOESN'T apply (because W3C is on the right).

>The question remains: For source text, is correct bidi-rendering desirable for a given user for a given need?

What do you mean by "correct bidi rendering"? If you mean
"fully apply the Unicode bidi algorithm and nothing else",
then I'd clearly say NO. If you mean "something better
than just the Unicode bidi algorith", my answer would be
clearly YES, but depending on the material and on personal
preferences, there may be several ways.

Although there are several ways (an Arabic user who only
occasionally views/reads Latin would want to view an
XML document with mostly Arabic content and mostly Arabic
element/attribute names in overall RTL mode,...), for the
example above, what has been done e.g. at
seems perfectly reasonable to me, although it requires a
highly sophisticated editor that not only parses tags,
but the whole element structure/nesting, and understands
its:dir. You can see how that would work at
if you change "its:dir" to "dir" and select markup language
xhtml (rather than xml).

Regards,     Martin.

>Regards, Najib
>> Many thanks for your input in advance,
>> Felix
>Najib TOUNSI (mailto:tounsi @ w3.org)
>Bureau W3C au Maroc (http://www.w3c.org.ma/)
>Ecole Mohammadia d'Ingenieurs, BP 765 Agdal-RABAT Maroc (Morocco)
>Phone : +212 (0) 37 68 71 50 (P1711)  Fax : +212 (0) 37 77 88 53
>Mobile: +212 (0) 61 22 00 30 

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     
Received on Monday, 10 September 2007 08:49:51 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:43:09 UTC