- From: Matitiahu Allouche <matial@il.ibm.com>
- Date: Fri, 28 May 2010 16:44:07 +0300
- To: Mark Davis ☕ <mark@macchiato.com>
- Cc: Adil Allawi <adil@diwan.com>, "aharon@google.com" <aharon@google.com>, "bidi@unicode.org" <bidi@unicode.org>, bidi-bounce@unicode.org, Murray Sargent <murrays@exchange.microsoft.com>, Nasser Kettani <Nasser.Kettani@microsoft.com>, "public-iri@w3.org" <public-iri@w3.org>, Shawn Steele <Shawn.Steele@microsoft.com>
- Message-ID: <OF2B5B2205.FD0D103E-ONC2257731.004993F1-C2257731.004B731A@il.ibm.com>
Mark Davis wrote:
<quote>
3. New Characters (Adil's proposal).
While an interesting proposal, the problems would be:
introducing security risks with the new characters.
a significant change to the UBA - and even extremely minor changes have
caused enough problems that the UTC has grown quite leery of rocking the
boat.
</quote>
This is rather an addition to the UBA than a change of existing behavior.
This would entail defining a new Bidi class for characters that would
behave like L if the current explicit level is even and as R if the
current explicit level is odd. Let us call it EL == Explicit Level
separator (a better name is called for, if the idea gets any traction).
By inserting EL characters between the contents of successive cells, the
full line would be ready for display in either an LTR or an RTL paragraph
direction.
Independently of URLs, I think that such a Bidi class would be useful for
separating a line into cells, like for displaying text in columns.
Currently, we have the S class (segment separator), which includes mostly
the Tab character, but it is not quite satisfactory IMHO since there is
interaction between text in adjacent segments.
If we add this new class, the UBA would not have to change at all for
existing classes and their characters, only the new behavior, which is not
terribly complex, would have to be added for new characters defined for
this class.
By the way, this does not mean that I support Adil's proposal.
Shalom (Regards), Mati
Bidi Architect
Globalization Center Of Competency - Bidirectional Scripts
IBM Israel
Phone: +972 2 5888802 Fax: +972 2 5870333 Mobile: +972 52
2554160
From:
Mark Davis ☕ <mark@macchiato.com>
To:
Shawn Steele <Shawn.Steele@microsoft.com>, Adil Allawi <adil@diwan.com>,
"public-iri@w3.org" <public-iri@w3.org>, "bidi@unicode.org"
<bidi@unicode.org>, Murray Sargent <murrays@exchange.microsoft.com>,
"aharon@google.com" <aharon@google.com>, Nasser Kettani
<Nasser.Kettani@microsoft.com>
Date:
28/05/2010 01:25
Subject:
[bidi] Re: Special ordering for BIDI URLs
Sent by:
bidi-bounce@unicode.org
A few comments on various issues.
1. Market Forces. Make it possible for URLs (actually IRIs) to be
completely RTL
A. Shawn raised the issue of .html. As I think about it, there are a
couple of ways to deal with this. First, even currently servers don't need
to use those suffixes: http://unicode.org/reports/ doesn't contain a
.html. Secondly, we could establish equivalences for some Hebrew and
Arabic-script suffixes to take the place of those.
2. Specialized BIDI. Force a consistent order on URLs, using a
higher-level protocol on top of the UBA.
A. The proponents of specialized reordering really need to come up with a
good story for how to deal with the security and interoperability issues
presented by plaintext applications and non-new-URL-ordering applications.
B. There are actually two variants of this:
a. have the consistent order be LTR.
b. have the consistent order be the paragraph direction.
(a) is a simpler approach technically, since the generated plaintext can
have single direction associated with the label separators. It can be
implemented in display and cut/paste by having LRMs around each label that
contains a RTL character or no LTR characters.
While for users this may not be quite as natural, the most important
feature is having a predictable ordering (the ordering of labels in URLs
is already somewhat screwy, since the domain name is Little-Endian, and
the rest is Big-Endian).
3. New Characters (Adil's proposal).
While an interesting proposal, the problems would be:
introducing security risks with the new characters.
a significant change to the UBA - and even extremely minor changes have
caused enough problems that the UTC has grown quite leery of rocking the
boat.
it takes at least a couple of years to get characters accepted by both
Unicode and ISO.
none of the old URL-aware software would handle the new URLs (a problem
also for the LRM approach).
Mark
Received on Friday, 28 May 2010 13:44:43 UTC