RE: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.

AFAIK, the scheme is always ASCII (though some cultures want native script schemes so they don't have to do keyboard switching, but that seems like a different problem).  

For http, domain names are Unicode, though a restricted subset, and the path is often currently ASCII or % encoded, but could presumably be bigger than that.
For mail, the domain names are also IDN subset of Unicode, however (now) the local part is legally anything > 0x7f, though the <0x80 set is restricted.

I think the legal characters are defined by the schemes?  Though EAI mail is clearly overly permissive, and I wouldn't mind disallowing BIDI marks in IRIs if it helped display.

The schemes have delimiters (@, ., /, etc.), which are all bidi character type ON.  As far as implementation goes, probably treating those all like L or R (depending on the mode) might "solve" the problem, as that would force the sections into separate units, which would then maintain the same order as the underlying binary representation (label 1 first, label 2 next, etc).  That doesn't seem terribly difficult from an implementation perspective.

One big thing that I'm willing to dump is that you and I (or an Arabic speaker) might "see" the same IRI identically at all times.  Eg: http://WWW.ARABIC.TLD vs TLD.ARABIC.WWW//:http.  However if either of us read it over the phone, we'd all read the same thing (assuming we knew how to read Arabic at all.)  

As the discussion continues, I'm also getting more entrenched in my position that this is very likely a user preference.  Particularly WRT the usability of the address bar.  I'll see if I can find time to make some screen shots that demonstrate the problem.

-Shawn

-----Original Message-----
From: Larry Masinter [mailto:masinter@adobe.com] 
Sent: ,  03,  2012 2:23
To: Shawn Steele; Adil Allawi
Cc: public-iri@w3.org; "Martin J. Dürst"
Subject: RE: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.

I'm really having trouble understanding this discussion.

" Bidi IRI is only allowed to be ordered RTL if it is drawn with the protocol (e.g. http://) and in a right-dominant context (i.e. it is not embedded in a line of latin text)."

I don't know what it "allowed" means here.

I have an  IRI which, in logical order, starts with a (ASCII) scheme, includes a RTL domain name, and a path, with RTL, LTR, or mixed components.

Who would be "allowed" to do what? In what circumstances? What would be the consequence of them not doing this?

I don't understand if you're talking about restrictions on allowed characters in IRI, guidelines for software for displaying IRIs, guidelines for encoding IRIs in "plain" RTL or RTL text, or something else ....

Some examples would help enormously.

My fear is that we'll once again get to a set of requirements that you're happy with but which can't be implemented, which won't help us.


-----Original Message-----
From: Shawn Steele [mailto:Shawn.Steele@microsoft.com]
Sent: Tuesday, April 03, 2012 1:59 AM
To: Adil Allawi
Cc: public-iri@w3.org; "Martin J. Dürst"
Subject: RE: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.

I'm not sure what "reorder unexpectedly" means.  

Presumably an Arabic speaker that went to an internet site:  LABEL2.LABEL1 would definitely NOT expect LABEL1.LABEL2/index.html just because we now have added "index.html" to it.  (And LABEL2.LABEL1/index.html is far worse from our investigations).

Your suggestion might make sense for a user that normally only sees LTR text (like me), but for a user that normally sees RTL text, you could argue the opposite:  That unless there's strong left-dominant context (eg: it IS embedded in a line of Latin text), that it should be ordered from RTL.

-Shawn

-----Original Message-----
From: Adil Allawi [mailto:adil@diwan.com]
Sent: Monday, April 2, 2012 3:31 PM
To: Shawn Steele
Cc: public-iri@w3.org; "Martin J. Dürst"
Subject: Re: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.

OK. How about saying that a Bidi IRI is only allowed to be ordered RTL if it is either
 - drawn with the protocol (e.g. http://) and in a right-dominant context (i.e. it is not embedded in a line of Latin text)
 - or that the IRI only contains either neutrals or strong right-to-left characters.

This way we can be sure that the IRI would not reorder unexpectedly.

The cut and paste is an interesting issue. If we forces a single direction then it would be OK - but that would not solve your problem.

Adil

On Mon Apr  2 15:19:20 2012, Shawn Steele wrote:
> I don't see that helps very much.  A bare domain name by itself that was entirely Arabic would reasonably be ordered from right to left in an Arabic document, even if it didn't have an http.  Clearly it'd be a helpful indicator that this was an IRI though.
>
> I think that following the document's context is reasonable if you're missing other indicators, but I don't think it's possible to completely avoid confusion, if for no other reason than cut&  paste from a compliant app to an older app will likely cause differences in display for the same binary representation.
>
> -Shawn
>
> -----Original Message-----
> From: Adil Allawi [mailto:adil@diwan.com]
> Sent: ,  02,  2012 15:10
> To: Shawn Steele
> Cc: public-iri@w3.org; "Martin J. Dürst"
> Subject: Re: [iri] #121: BIDI: Some users are requiring right-to-left label ordering.
>
> With regard to Shawn's comments. Would it be acceptable to say that a Bidi IRI is only allowed to be ordered RTL if it is drawn with the protocol (e.g. http://) and in a right-dominant context (i.e. it is not embedded in a line of latin text).
>
> In this way we can allow the RTL alignment with the caveat that the user needs to be educated on the directional issues; but we would not have the confusion of the order that the elements are appearing as the "http://" will act as a visible direction guide.
>
> Adil
>
> On Mon Apr  2 10:55:22 2012, Shawn Steele wrote:
>>>> 2) You end up with different displays between places that "know"
>>>> there's an IRI (e.g. browser address bar) and places that don't
>>
>> That's unavoidable.  People will follow this RFC or they won't.  The 
>> Unicode Bidi Algorithm doesn't include this guidance, so plain text 
>> will also fail, though some apps may try to be "smarter".  For years 
>> people will have different browser versions with different behaviors, 
>> etc.  The UBA is also inconsistently applied, and at inconsistent 
>> revisions, so I think it's a bit presumptuous of us to think that 
>> anything we specify here could cause consistent rendering by our 
>> guidance :)
>>
>> IMO: There's a more general "list" problem with the UBA, and that having the UBA address that might be interesting.
>>
>>> Actually, the current solution was proposed by Mati Alluche, and he 
>>> argued that it would be possible for people to understand the 
>>> ordering because of the heuristics they use when reading mixed text:
>>
>> That doesn't match our investigation.  That presumes that people read it as trained by the UBA, however when encountering list-like structures, people don't typically apply the UBA.  Unfortunately, regardless of the approach, some training of the user community is likely required.
>>
>>>> My requirements are:
>>>> 1) The logical order of the parts MUST be preserved.
>>
>>> That sounds like a very logical requirement :-). As always in the 
>>> IETF, any arguments/data to support that would be very much 
>>> appreciated (your list equivalent is certainly counting towards that).
>>
>> I don't have a formal white paper user study.  This comes from discussions with native bidi speakers, technical, non-technical, and in-between.  Also from feedback from the community.  This is how we realized that IRI's are best treated like the "list" analogy.
>>
>> Fortunately 90% of the most common cases are probably a loose domain, like the side of a bus, and those are probably all same-script IRIs.
>>
>>>> 2) There MUST be a way for mostly Arabic, etc. IRIs to be rendered right to left.
>>>>  * So the corollary of 1&    2 is that the protocol has to go on the right
>>
>>> By protocol, do you mean the scheme name (such as ftp:, mailto:, 
>>> http:, https:,...)?
>>
>>
>>>> 3) I'd really like a MAY that allows some flexibility for 2; when it's LTR and when it's RTL.
>>
>>> You mean some flexibility depending on context? We could also make 
>>> that "MUST respect context". But then there's the problem that the 
>>> context of a side of a bus is rather vague :-).
>>
>> Not if it's a bus in Cairo, or a bus in Washington DC.  Though either is probably going to be a single script.
>>
>>>> At a minimum, I'd suggest that any RTL characters in the domain or email local parts should force 2).
>>
>>> In my personal view, I think that might be overkill. I'm not sure 
>>> I'd want everything turned around just because of a few RTL characters.
>>> But if that's what everybody agrees on, I won't stay in the way.
>>
>> IMO this is mostly a user preference.  "I" would probably prefer the LTR ordering, even for an entirely Arabic IRI, because then I'd be able to understand the parts.  Eg: If the ordering were consistent, I could chomp off a subdomain to get to a parent domain, or remove the path part to get to the home page.  If that changes in the middle, I'd be unsuccessful.
>>
>>> The really tough problem for anything that reorders by component 
>>> (what you call 'logical order of parts') is that it may be easy to 
>>> write a standard that says so, but it's difficult to implement. Any 
>>> thoughts about that?
>>
>> Yes :)  I'd be much happier coming up with a behavior that's understandable by 90% of the humans and have problems implementing it, than causing ambiguity for 50% of the population just because it was easy to implement.
>>
>> We also came up with a couple practical observations:
>> Many paths are "long".  They are also likely mostly ASCII for the foreseeable future.  If I render a path with http:// on the left, and an Arabic domain name, then a path on the right, an RTL user with an RTL address bar will have a hard time discovering the domain, which is the most important part of the IRI, because it won't be near the right side of the textbox.
>>
>> Worse, if the path/query gets long enough, then you have 2 really bad options:  Either allow the host name to be cropped from the left of the address bar, or clip the path on the RIGHT side, like an LTR textbox, impacting the usability of the RTL app.
>>
>>
>

Received on Tuesday, 3 April 2012 16:29:52 UTC