Re: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.

Sorry for the delay in writing this answer.

On 2012/03/30 2:09, Shawn Steele wrote:
>> Also within this subset of URLs it is possible to have browsers draw these  in the URL bar
>> right-to-left and right aligned. But I do not know if this  document is the place for such a definition.
>
> The document explicitly prohibits alternate renderings, like LKJ/IHG.FED.CBA//:http

Yes, it currently does. I personally don't necessarily think we need to 
keep it that strict. But we need to be very sure of what the trade-offs 
are, and there are definitely very strong trade-offs.

One thing that may be possible to remove is the condition that the 
embedding be LTR, thus also allowing RTL embedding. But I understand 
that wouldn't yet make you happy.


> I find the current behavior very bad since it treats the IRI like unstructured text.

Indeed IRIs are treated like unstructured text, but that may not 
necessarily be bad.


> However there is a structure; there's an order to the labels.

Yes. Some people are very aware of that structure, others aren't.


> If we'd never heard of the BIDI algorithm, our first attempt, from a clean slate, to solve this problem would not allow the ordering of the labels to be exchanged.

I think that was indeed the case, until we realized that in order to do 
that, one of two things are needed:
1) You have to insert Bidi marks into the IRI, which means it's no 
longer the same IRI, or
2) You end up with different displays between places that "know" there's 
an IRI (e.g. browser address bar) and places that don't


> The only reason we're considering that is because we've seen what the Bidi Algorithm does to other text in completely different contexts.

Actually, the current solution was proposed by Mati Alluche, and he 
argued that it would be possible for people to understand the ordering 
because of the heuristics they use when reading mixed text:

Read some text in the main direction, if you meet text in the other 
direction, jump to the end of that run of text and read "backwards", 
then continue with the text in the main direction. That's a different 
heuristic to the one you have used as an equivalent, namely the list 
(which the Unicode Bidi Algorithm actually also would "mess up" so that 
sequential RTL items would be ordered RTL overall; not sure what people 
usually do in these cases, whether they fix it up or not).

Mati said that this would not necessarily help URI/IRI experts, but 
might actually be quite easy for non-experts, potentially the easiest 
solution (easier than the strict component logical order) for them. I'm 
not in a location where I have enough non-IRI-expert average bidi users 
around me to test this.


> My requirements are:
> 1) The logical order of the parts MUST be preserved.

That sounds like a very logical requirement :-). As always in the IETF, 
any arguments/data to support that would be very much appreciated (your 
list equivalent is certainly counting towards that).


> 2) There MUST be a way for mostly Arabic, etc. IRIs to be rendered right to left.
>  * So the corollary of 1&  2 is that the protocol has to go on the right

By protocol, do you mean the scheme name (such as ftp:, mailto:, http:, 
https:,...)?


> 3) I'd really like a MAY that allows some flexibility for 2; when it's LTR and when it's RTL.

You mean some flexibility depending on context? We could also make that 
"MUST respect context". But then there's the problem that the context of 
a side of a bus is rather vague :-).


> I don't think we're going to get it perfect in our first pass.

We are already at the second pass. The first pass was RFC 3987.


> At a minimum, I'd suggest that any RTL characters in the domain or email local parts should force 2).

In my personal view, I think that might be overkill. I'm not sure I'd 
want everything turned around just because of a few RTL characters. But 
if that's what everybody agrees on, I won't stay in the way.


The really tough problem for anything that reorders by component (what 
you call 'logical order of parts') is that it may be easy to write a 
standard that says so, but it's difficult to implement. Any thoughts 
about that?


Regards,    Martin.

Received on Monday, 2 April 2012 02:32:51 UTC