W3C home > Mailing lists > Public > public-iri@w3.org > May 2010

Re: [bidi] Re: Special ordering for BIDI URLs

From: Matitiahu Allouche <matial@il.ibm.com>
Date: Fri, 28 May 2010 16:23:25 +0300
To: Mark Davis ☕ <mark@macchiato.com>
Cc: Adil Allawi <adil@diwan.com>, "aharon@google.com" <aharon@google.com>, "bidi@unicode.org" <bidi@unicode.org>, bidi-bounce@unicode.org, Murray Sargent <murrays@exchange.microsoft.com>, Nasser Kettani <Nasser.Kettani@microsoft.com>, "public-iri@w3.org" <public-iri@w3.org>, Shawn Steele <Shawn.Steele@microsoft.com>
Message-ID: <OF79956580.571E2860-ONC2257731.00470364-C2257731.00498E39@il.ibm.com>
Mark Davis wrote:
<quote>
2. Specialized BIDI. Force a consistent order on URLs, using a 
higher-level protocol on top of the UBA.

. . .

B. There are actually two variants of this: 
a. have the consistent order be LTR.
b. have the consistent order be the paragraph direction.

(a) is a simpler approach technically, since the generated plaintext can 
have single direction associated with the label separators. It can be 
implemented in display and cut/paste by having LRMs around each label that 
contains a RTL character or no LTR characters.
</quote>

I think that adding LRMs around RTL labels would not be enough, if the 
context is RTL.  Assume the following URL:
   http://12-34.ABC.DEF.567

and let us represent LRM by @.
Mark's variant (a) results in adding LRMs as follows:
   http://@12-34@.@ABC@.@DEF@.@567@

In a RTL context, this will be displayed as:
   @567@.@FED@.@CBAhttp://@12-34@.@


In order to get the consistent LTR display order, we need to add LRE/PDF 
around the URL as follows (where [ represents LRE and ^ represents PDF):
   [http://@12-34@.@ABC@.@DEF@.@567@^

which will be displayed as follows, independently of the context:
   [http://@12-34@.@CBA@.@FED@.@567@^


We can see that this can be simplified by only having LRM before (not 
around) RTL labels, and only if they follow an RTL label, and before 
labels containing no LTR characters only if they follow a RTL label, as 
follows:
   [http://12-34.ABC.@DEF.@567^


This is not overly complex to do.  I know, I have written code for it.


Shalom (Regards),  Mati
           Bidi Architect
           Globalization Center Of Competency - Bidirectional Scripts
           IBM Israel
           Phone: +972 2 5888802    Fax: +972 2 5870333    Mobile: +972 52 
2554160




From:
Mark Davis ☕ <mark@macchiato.com>
To:
Shawn Steele <Shawn.Steele@microsoft.com>, Adil Allawi <adil@diwan.com>, 
"public-iri@w3.org" <public-iri@w3.org>, "bidi@unicode.org" 
<bidi@unicode.org>, Murray Sargent <murrays@exchange.microsoft.com>, 
"aharon@google.com" <aharon@google.com>, Nasser Kettani 
<Nasser.Kettani@microsoft.com>
Date:
28/05/2010 01:25
Subject:
[bidi] Re: Special ordering for BIDI URLs
Sent by:
bidi-bounce@unicode.org



A few comments on various issues.

1. Market Forces. Make it possible for URLs (actually IRIs) to be 
completely RTL

A. Shawn raised the issue of .html. As I think about it, there are a 
couple of ways to deal with this. First, even currently servers don't need 
to use those suffixes: http://unicode.org/reports/ doesn't contain a 
.html. Secondly, we could establish equivalences for some Hebrew and 
Arabic-script suffixes to take the place of those.

2. Specialized BIDI. Force a consistent order on URLs, using a 
higher-level protocol on top of the UBA.

A. The proponents of specialized reordering really need to come up with a 
good story for how to deal with the security and interoperability issues 
presented by plaintext applications and non-new-URL-ordering applications.

B. There are actually two variants of this: 
a. have the consistent order be LTR.
b. have the consistent order be the paragraph direction.

(a) is a simpler approach technically, since the generated plaintext can 
have single direction associated with the label separators. It can be 
implemented in display and cut/paste by having LRMs around each label that 
contains a RTL character or no LTR characters.

While for users this may not be quite as natural, the most important 
feature is having a predictable ordering (the ordering of labels in URLs 
is already somewhat screwy, since the domain name is Little-Endian, and 
the rest is Big-Endian).

3. New Characters (Adil's proposal).

While an interesting proposal, the problems would be: 
introducing security risks with the new characters.
a significant change to the UBA - and even extremely minor changes have 
caused enough problems that the UTC has grown quite leery of rocking the 
boat.
it takes at least a couple of years to get characters accepted by both 
Unicode and ISO.
none of the old URL-aware software would handle the new URLs (a problem 
also for the LRM approach).
Mark


Received on Friday, 28 May 2010 13:24:04 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 April 2012 19:51:57 GMT