- From: Matitiahu Allouche <matial@il.ibm.com>
- Date: Sun, 30 May 2010 19:27:46 +0300
- To: Adil Allawi <adil@diwan.com>
- Cc: "aharon@google.com" <aharon@google.com>, "bidi@unicode.org" <bidi@unicode.org>, bidi-bounce@unicode.org, Mark Davis ☕ <mark@macchiato.com>, Murray Sargent <murrays@exchange.microsoft.com>, Nasser Kettani <Nasser.Kettani@microsoft.com>, "public-iri@w3.org" <public-iri@w3.org>, Shawn Steele <Shawn.Steele@microsoft.com>
- Message-ID: <OFBDD61005.A0EAED0A-ONC2257731.00526C61-C2257733.005A6F0C@il.ibm.com>
It seems clear that there is no ideal solution for this issue. If there was one, I think that somebody would have come forward with it already. So, any solution must be a compromise which favors the considerations that the author sees as most important and somehow shoves aside those considered secondary. For what it's worth, I will write below my own preferences. They are based on the following premises. a) Pure RTL URLs are not practical currently, because of the scheme (http etc...) and the extension (html, asp, php etc...). Localizing them on the client side would be a vast effort with hard issues of coordination, education and likely also politics. b) Adding duplicates of URL delimiters with special Bidi properties (Adil's proposal) raises its own problems which Mark Davis has enumerated in his note dated May 28th. Note also that it assumes using Unicode, while many Hebrew and Arabic pages use windows-1255 and windows-1256 charsets. This is also a constraint in my proposed solution below. c) My main consideration is that a person reading a URL from a bus side or a napkin must be able to unequivocally understand the intended order of the different parts of the URL. Consequently, the parts must be laid out in a uniform direction, although each part will be displayed according to the Unicode Bidi Algorithm (UBA). For congruity with non-Bidi URLs, the uniform direction will be LTR. Given the above, the technical proposal is as follows: 1) For presentation, a Bidi URL must be preceded by LRE and followed by PDF, unless 1.1 it starts with a LTR character AND contains no RTL character AND ends with a LTR character or a digit OR 1.2 the context (e.g. paragraph direction) is LTR. 2) For presentation, a part of a URL will be preceded by LRM if 2.1 there is a preceding part which contains RTL characters AND 2.2 the current part contains RTL characters OR has digits before any strong LTR character. 3) All such formatting characters (LRE, PDF and LRMs) will be stripped before sending to the server side. 4) From the registration point of view, only the stripped version of the URL needs to be registered. Versions including formatting characters are not allowed for registration. 5) Bidi-URL-aware user agents should facilitate user entry of URLs by adding the proper formatting characters while typing, or at least when the user confirms the data (by pressing Enter or a similar action). 6) All user agents must remove formatting characters from URLs before sending on the wire. And yes, I am conscious that the transition period will be, euphemistically speaking, challenging. But this is true for any proposed change, and it is better to suffer while getting to a good place than while staying in a bad one. Shalom (Regards), Mati Bidi Architect Globalization Center Of Competency - Bidirectional Scripts IBM Israel Phone: +972 2 5888802 Fax: +972 2 5870333 Mobile: +972 52 2554160
Received on Sunday, 30 May 2010 16:28:27 UTC