- From: Matitiahu Allouche <matial@il.ibm.com>
- Date: Fri, 28 May 2010 16:44:07 +0300
- To: Mark Davis ☕ <mark@macchiato.com>
- Cc: Adil Allawi <adil@diwan.com>, "aharon@google.com" <aharon@google.com>, "bidi@unicode.org" <bidi@unicode.org>, bidi-bounce@unicode.org, Murray Sargent <murrays@exchange.microsoft.com>, Nasser Kettani <Nasser.Kettani@microsoft.com>, "public-iri@w3.org" <public-iri@w3.org>, Shawn Steele <Shawn.Steele@microsoft.com>
- Message-ID: <OF2B5B2205.FD0D103E-ONC2257731.004993F1-C2257731.004B731A@il.ibm.com>
Mark Davis wrote: <quote> 3. New Characters (Adil's proposal). While an interesting proposal, the problems would be: introducing security risks with the new characters. a significant change to the UBA - and even extremely minor changes have caused enough problems that the UTC has grown quite leery of rocking the boat. </quote> This is rather an addition to the UBA than a change of existing behavior. This would entail defining a new Bidi class for characters that would behave like L if the current explicit level is even and as R if the current explicit level is odd. Let us call it EL == Explicit Level separator (a better name is called for, if the idea gets any traction). By inserting EL characters between the contents of successive cells, the full line would be ready for display in either an LTR or an RTL paragraph direction. Independently of URLs, I think that such a Bidi class would be useful for separating a line into cells, like for displaying text in columns. Currently, we have the S class (segment separator), which includes mostly the Tab character, but it is not quite satisfactory IMHO since there is interaction between text in adjacent segments. If we add this new class, the UBA would not have to change at all for existing classes and their characters, only the new behavior, which is not terribly complex, would have to be added for new characters defined for this class. By the way, this does not mean that I support Adil's proposal. Shalom (Regards), Mati Bidi Architect Globalization Center Of Competency - Bidirectional Scripts IBM Israel Phone: +972 2 5888802 Fax: +972 2 5870333 Mobile: +972 52 2554160 From: Mark Davis ☕ <mark@macchiato.com> To: Shawn Steele <Shawn.Steele@microsoft.com>, Adil Allawi <adil@diwan.com>, "public-iri@w3.org" <public-iri@w3.org>, "bidi@unicode.org" <bidi@unicode.org>, Murray Sargent <murrays@exchange.microsoft.com>, "aharon@google.com" <aharon@google.com>, Nasser Kettani <Nasser.Kettani@microsoft.com> Date: 28/05/2010 01:25 Subject: [bidi] Re: Special ordering for BIDI URLs Sent by: bidi-bounce@unicode.org A few comments on various issues. 1. Market Forces. Make it possible for URLs (actually IRIs) to be completely RTL A. Shawn raised the issue of .html. As I think about it, there are a couple of ways to deal with this. First, even currently servers don't need to use those suffixes: http://unicode.org/reports/ doesn't contain a .html. Secondly, we could establish equivalences for some Hebrew and Arabic-script suffixes to take the place of those. 2. Specialized BIDI. Force a consistent order on URLs, using a higher-level protocol on top of the UBA. A. The proponents of specialized reordering really need to come up with a good story for how to deal with the security and interoperability issues presented by plaintext applications and non-new-URL-ordering applications. B. There are actually two variants of this: a. have the consistent order be LTR. b. have the consistent order be the paragraph direction. (a) is a simpler approach technically, since the generated plaintext can have single direction associated with the label separators. It can be implemented in display and cut/paste by having LRMs around each label that contains a RTL character or no LTR characters. While for users this may not be quite as natural, the most important feature is having a predictable ordering (the ordering of labels in URLs is already somewhat screwy, since the domain name is Little-Endian, and the rest is Big-Endian). 3. New Characters (Adil's proposal). While an interesting proposal, the problems would be: introducing security risks with the new characters. a significant change to the UBA - and even extremely minor changes have caused enough problems that the UTC has grown quite leery of rocking the boat. it takes at least a couple of years to get characters accepted by both Unicode and ISO. none of the old URL-aware software would handle the new URLs (a problem also for the LRM approach). Mark
Received on Friday, 28 May 2010 13:44:43 UTC