W3C home > Mailing lists > Public > public-iri@w3.org > May 2010

Re: Special ordering for BIDI URLs

From: Matitiahu Allouche <matial@il.ibm.com>
Date: Sun, 30 May 2010 19:27:46 +0300
To: Adil Allawi <adil@diwan.com>
Cc: "aharon@google.com" <aharon@google.com>, "bidi@unicode.org" <bidi@unicode.org>, bidi-bounce@unicode.org, Mark Davis ☕ <mark@macchiato.com>, Murray Sargent <murrays@exchange.microsoft.com>, Nasser Kettani <Nasser.Kettani@microsoft.com>, "public-iri@w3.org" <public-iri@w3.org>, Shawn Steele <Shawn.Steele@microsoft.com>
Message-ID: <OFBDD61005.A0EAED0A-ONC2257731.00526C61-C2257733.005A6F0C@il.ibm.com>
It seems clear that there is no ideal solution for this issue. If there 
was one, I think that somebody would have come forward with it already. 
So, any solution must be a compromise which favors the considerations that 
the author sees as most important and somehow shoves aside those 
considered secondary.

For what it's worth, I will write below my own preferences.  They are 
based on the following premises.

a) Pure RTL URLs are not practical currently, because of the scheme (http 
etc...) and the extension (html, asp, php etc...).  Localizing them on the 
client side would be a vast effort with hard issues of coordination, 
education and likely also politics.

b) Adding duplicates of URL delimiters with special Bidi properties 
(Adil's proposal) raises its own problems which Mark Davis has enumerated 
in his note dated May 28th.
Note also that it assumes using Unicode, while many Hebrew and Arabic 
pages use windows-1255 and windows-1256 charsets.  This is also a 
constraint in my proposed solution below.

c) My main consideration is that a person reading a URL from a bus side or 
a napkin must be able to unequivocally understand the intended order of 
the different parts of the URL.
Consequently, the parts must be laid out in a uniform direction, although 
each part will be displayed according to the Unicode Bidi Algorithm (UBA). 
 For congruity with non-Bidi URLs, the uniform direction will be LTR.

Given the above, the technical proposal is as follows:

1) For presentation, a Bidi URL must be preceded by LRE and followed by 
PDF, unless 
  1.1 it starts with a LTR character AND contains no RTL character AND 
ends with a LTR character or a digit
  1.2 the context (e.g. paragraph direction) is LTR.

2) For presentation, a part of a URL will be preceded by LRM if 
  2.1 there is a preceding part which contains RTL characters
  2.2 the current part contains RTL characters OR has digits before any 
strong LTR character.

3) All such formatting characters (LRE, PDF and LRMs) will be stripped 
before sending to the server side.

4) From the registration point of view, only the stripped version of the 
URL needs to be registered.  Versions including formatting characters are 
not allowed for registration.

5) Bidi-URL-aware user agents should facilitate user entry of URLs by 
adding the proper formatting characters while typing, or at least when the 
user confirms the data (by pressing Enter or a similar action).

6) All user agents must remove formatting characters from URLs before 
sending on the wire.

And yes, I am conscious that the transition period will be, 
euphemistically speaking, challenging.  But this is true for any proposed 
change, and it is better to suffer while getting to a good place than 
while staying in a bad one.

Shalom (Regards),  Mati
           Bidi Architect
           Globalization Center Of Competency - Bidirectional Scripts
           IBM Israel
           Phone: +972 2 5888802    Fax: +972 2 5870333    Mobile: +972 52 
Received on Sunday, 30 May 2010 16:28:27 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:39:41 UTC