- From: Matitiahu Allouche <matial@il.ibm.com>
- Date: Sun, 30 May 2010 19:27:46 +0300
- To: Adil Allawi <adil@diwan.com>
- Cc: "aharon@google.com" <aharon@google.com>, "bidi@unicode.org" <bidi@unicode.org>, bidi-bounce@unicode.org, Mark Davis ☕ <mark@macchiato.com>, Murray Sargent <murrays@exchange.microsoft.com>, Nasser Kettani <Nasser.Kettani@microsoft.com>, "public-iri@w3.org" <public-iri@w3.org>, Shawn Steele <Shawn.Steele@microsoft.com>
- Message-ID: <OFBDD61005.A0EAED0A-ONC2257731.00526C61-C2257733.005A6F0C@il.ibm.com>
It seems clear that there is no ideal solution for this issue. If there
was one, I think that somebody would have come forward with it already.
So, any solution must be a compromise which favors the considerations that
the author sees as most important and somehow shoves aside those
considered secondary.
For what it's worth, I will write below my own preferences. They are
based on the following premises.
a) Pure RTL URLs are not practical currently, because of the scheme (http
etc...) and the extension (html, asp, php etc...). Localizing them on the
client side would be a vast effort with hard issues of coordination,
education and likely also politics.
b) Adding duplicates of URL delimiters with special Bidi properties
(Adil's proposal) raises its own problems which Mark Davis has enumerated
in his note dated May 28th.
Note also that it assumes using Unicode, while many Hebrew and Arabic
pages use windows-1255 and windows-1256 charsets. This is also a
constraint in my proposed solution below.
c) My main consideration is that a person reading a URL from a bus side or
a napkin must be able to unequivocally understand the intended order of
the different parts of the URL.
Consequently, the parts must be laid out in a uniform direction, although
each part will be displayed according to the Unicode Bidi Algorithm (UBA).
For congruity with non-Bidi URLs, the uniform direction will be LTR.
Given the above, the technical proposal is as follows:
1) For presentation, a Bidi URL must be preceded by LRE and followed by
PDF, unless
1.1 it starts with a LTR character AND contains no RTL character AND
ends with a LTR character or a digit
OR
1.2 the context (e.g. paragraph direction) is LTR.
2) For presentation, a part of a URL will be preceded by LRM if
2.1 there is a preceding part which contains RTL characters
AND
2.2 the current part contains RTL characters OR has digits before any
strong LTR character.
3) All such formatting characters (LRE, PDF and LRMs) will be stripped
before sending to the server side.
4) From the registration point of view, only the stripped version of the
URL needs to be registered. Versions including formatting characters are
not allowed for registration.
5) Bidi-URL-aware user agents should facilitate user entry of URLs by
adding the proper formatting characters while typing, or at least when the
user confirms the data (by pressing Enter or a similar action).
6) All user agents must remove formatting characters from URLs before
sending on the wire.
And yes, I am conscious that the transition period will be,
euphemistically speaking, challenging. But this is true for any proposed
change, and it is better to suffer while getting to a good place than
while staying in a bad one.
Shalom (Regards), Mati
Bidi Architect
Globalization Center Of Competency - Bidirectional Scripts
IBM Israel
Phone: +972 2 5888802 Fax: +972 2 5870333 Mobile: +972 52
2554160
Received on Sunday, 30 May 2010 16:28:27 UTC