Re: [bidi] Re: Special ordering for BIDI URLs from Matitiahu Allouche on 2010-05-28 (public-iri@w3.org from May 2010)

From: Matitiahu Allouche <matial@il.ibm.com>
Date: Fri, 28 May 2010 16:44:07 +0300
To: Mark Davis ☕ <mark@macchiato.com>
Cc: Adil Allawi <adil@diwan.com>, "aharon@google.com" <aharon@google.com>, "bidi@unicode.org" <bidi@unicode.org>, bidi-bounce@unicode.org, Murray Sargent <murrays@exchange.microsoft.com>, Nasser Kettani <Nasser.Kettani@microsoft.com>, "public-iri@w3.org" <public-iri@w3.org>, Shawn Steele <Shawn.Steele@microsoft.com>
Message-ID: <OF2B5B2205.FD0D103E-ONC2257731.004993F1-C2257731.004B731A@il.ibm.com>

Mark Davis wrote:
<quote>
3. New Characters (Adil's proposal).

While an interesting proposal, the problems would be: 
introducing security risks with the new characters.
a significant change to the UBA - and even extremely minor changes have 
caused enough problems that the UTC has grown quite leery of rocking the 
boat.
</quote>

This is rather an addition to the UBA than a change of existing behavior. 
This would entail defining a new Bidi class for characters that would 
behave like L if the current explicit level is even and as R if the 
current explicit level is odd.  Let us call it EL == Explicit Level 
separator (a better name is called for, if the idea gets any traction).
By inserting EL characters between the contents of successive cells, the 
full line would be ready for display in either an LTR or an RTL paragraph 
direction.

Independently of URLs, I think that such a Bidi class would be useful for 
separating a line into cells, like for displaying text in columns. 
Currently, we have the S class (segment separator), which includes mostly 
the Tab character, but it is not quite satisfactory IMHO since there is 
interaction between text in adjacent segments.

If we add this new class, the UBA would not have to change at all for 
existing classes and their characters, only the new behavior, which is not 
terribly complex, would have to be added for new characters defined for 
this class.


By the way, this does not mean that I support Adil's proposal.

Shalom (Regards),  Mati
           Bidi Architect
           Globalization Center Of Competency - Bidirectional Scripts
           IBM Israel
           Phone: +972 2 5888802    Fax: +972 2 5870333    Mobile: +972 52 
2554160




From:
Mark Davis ☕ <mark@macchiato.com>
To:
Shawn Steele <Shawn.Steele@microsoft.com>, Adil Allawi <adil@diwan.com>, 
"public-iri@w3.org" <public-iri@w3.org>, "bidi@unicode.org" 
<bidi@unicode.org>, Murray Sargent <murrays@exchange.microsoft.com>, 
"aharon@google.com" <aharon@google.com>, Nasser Kettani 
<Nasser.Kettani@microsoft.com>
Date:
28/05/2010 01:25
Subject:
[bidi] Re: Special ordering for BIDI URLs
Sent by:
bidi-bounce@unicode.org



A few comments on various issues.

1. Market Forces. Make it possible for URLs (actually IRIs) to be 
completely RTL

A. Shawn raised the issue of .html. As I think about it, there are a 
couple of ways to deal with this. First, even currently servers don't need 
to use those suffixes: http://unicode.org/reports/ doesn't contain a 
.html. Secondly, we could establish equivalences for some Hebrew and 
Arabic-script suffixes to take the place of those.

2. Specialized BIDI. Force a consistent order on URLs, using a 
higher-level protocol on top of the UBA.

A. The proponents of specialized reordering really need to come up with a 
good story for how to deal with the security and interoperability issues 
presented by plaintext applications and non-new-URL-ordering applications.

B. There are actually two variants of this: 
a. have the consistent order be LTR.
b. have the consistent order be the paragraph direction.

(a) is a simpler approach technically, since the generated plaintext can 
have single direction associated with the label separators. It can be 
implemented in display and cut/paste by having LRMs around each label that 
contains a RTL character or no LTR characters.

While for users this may not be quite as natural, the most important 
feature is having a predictable ordering (the ordering of labels in URLs 
is already somewhat screwy, since the domain name is Little-Endian, and 
the rest is Big-Endian).

3. New Characters (Adil's proposal).

While an interesting proposal, the problems would be: 
introducing security risks with the new characters.
a significant change to the UBA - and even extremely minor changes have 
caused enough problems that the UTC has grown quite leery of rocking the 
boat.
it takes at least a couple of years to get characters accepted by both 
Unicode and ISO.
none of the old URL-aware software would handle the new URLs (a problem 
also for the LRM approach).
Mark

Received on Friday, 28 May 2010 13:44:43 UTC