W3C home > Mailing lists > Public > public-iri@w3.org > April 2012

RE: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.

From: Shawn Steele <Shawn.Steele@microsoft.com>
Date: Mon, 2 Apr 2012 23:59:07 +0000
To: Adil Allawi <adil@diwan.com>
CC: "public-iri@w3.org" <public-iri@w3.org>, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
Message-ID: <E14011F8737B524BB564B05FF748464A5B1CB525@TK5EX14MBXC139.redmond.corp.microsoft.com>
I'm not sure what "reorder unexpectedly" means.  

Presumably an Arabic speaker that went to an internet site:  LABEL2.LABEL1 would definitely NOT expect LABEL1.LABEL2/index.html just because we now have added "index.html" to it.  (And LABEL2.LABEL1/index.html is far worse from our investigations).

Your suggestion might make sense for a user that normally only sees LTR text (like me), but for a user that normally sees RTL text, you could argue the opposite:  That unless there's strong left-dominant context (eg: it IS embedded in a line of Latin text), that it should be ordered from RTL.

-Shawn

-----Original Message-----
From: Adil Allawi [mailto:adil@diwan.com] 
Sent: Monday, April 2, 2012 3:31 PM
To: Shawn Steele
Cc: public-iri@w3.org; "Martin J. Dürst"
Subject: Re: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.

OK. How about saying that a Bidi IRI is only allowed to be ordered RTL if it is either
 - drawn with the protocol (e.g. http://) and in a right-dominant context (i.e. it is not embedded in a line of Latin text)
 - or that the IRI only contains either neutrals or strong right-to-left characters.

This way we can be sure that the IRI would not reorder unexpectedly.

The cut and paste is an interesting issue. If we forces a single direction then it would be OK - but that would not solve your problem.

Adil

On Mon Apr  2 15:19:20 2012, Shawn Steele wrote:
> I don't see that helps very much.  A bare domain name by itself that was entirely Arabic would reasonably be ordered from right to left in an Arabic document, even if it didn't have an http.  Clearly it'd be a helpful indicator that this was an IRI though.
>
> I think that following the document's context is reasonable if you're missing other indicators, but I don't think it's possible to completely avoid confusion, if for no other reason than cut&  paste from a compliant app to an older app will likely cause differences in display for the same binary representation.
>
> -Shawn
>
> -----Original Message-----
> From: Adil Allawi [mailto:adil@diwan.com]
> Sent: ,  02,  2012 15:10
> To: Shawn Steele
> Cc: public-iri@w3.org; "Martin J. Dürst"
> Subject: Re: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.
>
> With regard to Shawn's comments. Would it be acceptable to say that a Bidi IRI is only allowed to be ordered RTL if it is drawn with the protocol (e.g. http://) and in a right-dominant context (i.e. it is not embedded in a line of latin text).
>
> In this way we can allow the RTL alignment with the caveat that the user needs to be educated on the directional issues; but we would not have the confusion of the order that the elements are appearing as the "http://" will act as a visible direction guide.
>
> Adil
>
> On Mon Apr  2 10:55:22 2012, Shawn Steele wrote:
>>>> 2) You end up with different displays between places that "know"
>>>> there's an IRI (e.g. browser address bar) and places that don't
>>
>> That's unavoidable.  People will follow this RFC or they won't.  The 
>> Unicode Bidi Algorithm doesn't include this guidance, so plain text 
>> will also fail, though some apps may try to be "smarter".  For years 
>> people will have different browser versions with different behaviors, 
>> etc.  The UBA is also inconsistently applied, and at inconsistent 
>> revisions, so I think it's a bit presumptuous of us to think that 
>> anything we specify here could cause consistent rendering by our 
>> guidance :)
>>
>> IMO: There's a more general "list" problem with the UBA, and that having the UBA address that might be interesting.
>>
>>> Actually, the current solution was proposed by Mati Alluche, and he 
>>> argued that it would be possible for people to understand the 
>>> ordering because of the heuristics they use when reading mixed text:
>>
>> That doesn't match our investigation.  That presumes that people read it as trained by the UBA, however when encountering list-like structures, people don't typically apply the UBA.  Unfortunately, regardless of the approach, some training of the user community is likely required.
>>
>>>> My requirements are:
>>>> 1) The logical order of the parts MUST be preserved.
>>
>>> That sounds like a very logical requirement :-). As always in the 
>>> IETF, any arguments/data to support that would be very much 
>>> appreciated (your list equivalent is certainly counting towards that).
>>
>> I don't have a formal white paper user study.  This comes from discussions with native bidi speakers, technical, non-technical, and in-between.  Also from feedback from the community.  This is how we realized that IRI's are best treated like the "list" analogy.
>>
>> Fortunately 90% of the most common cases are probably a loose domain, like the side of a bus, and those are probably all same-script IRIs.
>>
>>>> 2) There MUST be a way for mostly Arabic, etc. IRIs to be rendered right to left.
>>>> 	* So the corollary of 1&    2 is that the protocol has to go on the right
>>
>>> By protocol, do you mean the scheme name (such as ftp:, mailto:, 
>>> http:, https:,...)?
>>
>>
>>>> 3) I'd really like a MAY that allows some flexibility for 2; when it's LTR and when it's RTL.
>>
>>> You mean some flexibility depending on context? We could also make 
>>> that "MUST respect context". But then there's the problem that the 
>>> context of a side of a bus is rather vague :-).
>>
>> Not if it's a bus in Cairo, or a bus in Washington DC.  Though either is probably going to be a single script.
>>
>>>> At a minimum, I'd suggest that any RTL characters in the domain or email local parts should force 2).
>>
>>> In my personal view, I think that might be overkill. I'm not sure 
>>> I'd want everything turned around just because of a few RTL characters.
>>> But if that's what everybody agrees on, I won't stay in the way.
>>
>> IMO this is mostly a user preference.  "I" would probably prefer the LTR ordering, even for an entirely Arabic IRI, because then I'd be able to understand the parts.  Eg: If the ordering were consistent, I could chomp off a subdomain to get to a parent domain, or remove the path part to get to the home page.  If that changes in the middle, I'd be unsuccessful.
>>
>>> The really tough problem for anything that reorders by component 
>>> (what you call 'logical order of parts') is that it may be easy to 
>>> write a standard that says so, but it's difficult to implement. Any 
>>> thoughts about that?
>>
>> Yes :)  I'd be much happier coming up with a behavior that's understandable by 90% of the humans and have problems implementing it, than causing ambiguity for 50% of the population just because it was easy to implement.
>>
>> We also came up with a couple practical observations:
>> Many paths are "long".  They are also likely mostly ASCII for the foreseeable future.  If I render a path with http:// on the left, and an Arabic domain name, then a path on the right, an RTL user with an RTL address bar will have a hard time discovering the domain, which is the most important part of the IRI, because it won't be near the right side of the textbox.
>>
>> Worse, if the path/query gets long enough, then you have 2 really bad options:  Either allow the host name to be cropped from the left of the address bar, or clip the path on the RIGHT side, like an LTR textbox, impacting the usability of the RTL app.
>>
>>
>

Received on Monday, 2 April 2012 23:59:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 April 2012 19:52:05 GMT