Re: Proposal for isolation characters in Unicode and the unicode-bidi:isolate and unicode-bidi:plaintext definitions from Mohamed Mohie on 2012-05-15 (public-i18n-bidi@w3.org from April to June 2012)

From: Mohamed Mohie <MOHIEM@eg.ibm.com>
Date: Tue, 15 May 2012 13:21:15 +0200
To: "Aharon (Vladimir) Lanin" <aharon@google.com>
Cc: Martin J. Dürst <duerst@it.aoyama.ac.jp>, public-i18n-bidi@w3.org
Message-ID: <OF5FD4BE68.F44D4214-ON422579FF.003E250A-422579FF.003E5EF3@eg.ibm.com>
Hello Aharon,
It's not clear to me what problems these additional characters can solve
which we can't solve in the current UBA by combining LRE/RLE and inserting
LRM/RLM?

Thanks And Best regards,
Mohamed Mohie , PMP®
_______________________________________________________
Manager of Arabic Competence and Globalization Center (ACGC)
GCoC BIDI , Advisory Software Engineer, Project Manager, M.Sc.
Cairo Technology Development Center (CTDC)
IBM Egypt-
email : mohiem@eg.ibm.com





From:	"Aharon (Vladimir) Lanin" <aharon@google.com>
To:	Martin J. Dürst <duerst@it.aoyama.ac.jp>
Cc:	public-i18n-bidi@w3.org
Date:	15/05/2012 11:09 ص
Subject:	Re: Proposal for isolation characters in Unicode and the
            unicode-bidi:isolate and unicode-bidi:plaintext definitions



[-www-style]

I guess public-i18n-bidi is an ok place to discuss the Unicode proposal.
But would it not be better to do so on some Unicode list, at least in
addition to here?


  It may be worth considering to create a new character to close these
  embeddings. Otherwise, older algorithms will close LRE/RLE/LRO/RLO
  embeddings/overrides prematurely.

Good point.


  Another question: What's the relationship between this proposal and the
  new bidi control character that was proposed (I think by Apple) around
  last November's UTC?

I guess you are referring to http://www.unicode.org/review/pri205/ ("LEVEL
DIRECTION MARK (LDM) behaves like a direction mark which dynamically takes
on the resolved direction associated with the current embedding level")

Using the current Unicode feature set, the way to deal with an
opposite-direction inline insert is to declare its direction with LRE|RLE +
PDF around it (to ensure the correct ordering inside the insert),
immediately followed by an LRM when the embedding level around the phrase
is even (LTR) or an RLM when it is odd (RTL), to prevent a number or an
unrelated opposite-direction phrase following the insert from "sticking" to
it. The principal difficulty in implementing this is that often the code
layer doing the insertion has no idea what the embedding level at the
insertion point is. The LDM would address this need; IMO it is the most
important use case for it.

Under the new proposal, the way to deal with the opposite-direction phrases
is to put them in an isolate. LRM and RLM - and thus LDM - are not
necessary. Furthermore, this way to deal with opposite-direction inline
inserts is more robust, because it works even when the insert is surrounded
by a phrase whose direction is opposite to the embedding level, but whose
direction is not explicitly declared. Of course, it would be better if the
direction of every opposite-direction phrase were declared, but often that
is not the way that bidi text is constructed. In such cases, an LDM (or
LRM|RLM) disrupts the phrase surrounding the insert.

I believe that the use cases cited for the LDM can also be achieved with
isolates. For example, "An Arabic numeric date of the form dd/MM/yyyy in
which the fields should flow left-to-right (e.g. 09/16/2011) in a
left-right context (i.e. the date and perhaps some other Arabic text are in
a mainly Latin-script paragraph), but should flow right-to-left (e.g
2011/16/09) in a right-left context (e.g. a primarily Arabic-script
paragraph)" can be achieved by putting each of the numbers (day, month,
year) in an a separate isolate, e.g. FSI09PDF/FSI16PDF/FSI2011PDF.

However, these are two independent proposals that do not actually conflict,
and you might want to get the opinion of the LDM's proposers :-)

Aharon

On Tue, May 15, 2012 at 10:24 AM, Aharon (Vladimir) Lanin <
aharon@google.com> wrote:
  I will reply substantively after taking www-style off the recipients. I
  don't think that the CSS list is the right place to discuss the details
  of the Unicode proposal.

  Aharon


  On Tue, May 15, 2012 at 10:10 AM, "Martin J. Dürst" <
  duerst@it.aoyama.ac.jp> wrote:
   On 2012/05/15 5:06, Aharon (Vladimir) Lanin wrote:
     Last week, I wrote up and Mark Davis submitted to the UTC a proposal (
     http://goo.gl/K6qtV) for adding bidi isolation to Unicode. Here is the
     basic proposal:

     --- start quote ---
     Define three new Unicode formatting code points:
     LRI: marks the beginning of a left-to-right isolate.
     RLI: marks the beginning of a right-to-left isolate.
     FSI: marks the beginning of a first-strong isolate.

     Each would be matched with a PDF.

   It may be worth considering to create a new character to close these
   embeddings. Otherwise, older algorithms will close LRE/RLE/LRO/RLO
   embeddings/overrides prematurely.

   Another question: What's the relationship between this proposal and the
   new bidi control character that was proposed (I think by Apple) around
   last November's UTC?

   Regards,   Martin.
Received on Tuesday, 15 May 2012 11:32:31 UTC