- From: Mohamed Mohie <MOHIEM@eg.ibm.com>
- Date: Tue, 15 May 2012 13:21:15 +0200
- To: "Aharon (Vladimir) Lanin" <aharon@google.com>
- Cc: Martin J. Dürst <duerst@it.aoyama.ac.jp>, public-i18n-bidi@w3.org
Hello Aharon, It's not clear to me what problems these additional characters can solve which we can't solve in the current UBA by combining LRE/RLE and inserting LRM/RLM? Thanks And Best regards, Mohamed Mohie , PMP® _______________________________________________________ Manager of Arabic Competence and Globalization Center (ACGC) GCoC BIDI , Advisory Software Engineer, Project Manager, M.Sc. Cairo Technology Development Center (CTDC) IBM Egypt- email : mohiem@eg.ibm.com From: "Aharon (Vladimir) Lanin" <aharon@google.com> To: Martin J. Dürst <duerst@it.aoyama.ac.jp> Cc: public-i18n-bidi@w3.org Date: 15/05/2012 11:09 Õ Subject: Re: Proposal for isolation characters in Unicode and the unicode-bidi:isolate and unicode-bidi:plaintext definitions [-www-style] I guess public-i18n-bidi is an ok place to discuss the Unicode proposal. But would it not be better to do so on some Unicode list, at least in addition to here? It may be worth considering to create a new character to close these embeddings. Otherwise, older algorithms will close LRE/RLE/LRO/RLO embeddings/overrides prematurely. Good point. Another question: What's the relationship between this proposal and the new bidi control character that was proposed (I think by Apple) around last November's UTC? I guess you are referring to http://www.unicode.org/review/pri205/ ("LEVEL DIRECTION MARK (LDM) behaves like a direction mark which dynamically takes on the resolved direction associated with the current embedding level") Using the current Unicode feature set, the way to deal with an opposite-direction inline insert is to declare its direction with LRE|RLE + PDF around it (to ensure the correct ordering inside the insert), immediately followed by an LRM when the embedding level around the phrase is even (LTR) or an RLM when it is odd (RTL), to prevent a number or an unrelated opposite-direction phrase following the insert from "sticking" to it. The principal difficulty in implementing this is that often the code layer doing the insertion has no idea what the embedding level at the insertion point is. The LDM would address this need; IMO it is the most important use case for it. Under the new proposal, the way to deal with the opposite-direction phrases is to put them in an isolate. LRM and RLM - and thus LDM - are not necessary. Furthermore, this way to deal with opposite-direction inline inserts is more robust, because it works even when the insert is surrounded by a phrase whose direction is opposite to the embedding level, but whose direction is not explicitly declared. Of course, it would be better if the direction of every opposite-direction phrase were declared, but often that is not the way that bidi text is constructed. In such cases, an LDM (or LRM|RLM) disrupts the phrase surrounding the insert. I believe that the use cases cited for the LDM can also be achieved with isolates. For example, "An Arabic numeric date of the form dd/MM/yyyy in which the fields should flow left-to-right (e.g. 09/16/2011) in a left-right context (i.e. the date and perhaps some other Arabic text are in a mainly Latin-script paragraph), but should flow right-to-left (e.g 2011/16/09) in a right-left context (e.g. a primarily Arabic-script paragraph)" can be achieved by putting each of the numbers (day, month, year) in an a separate isolate, e.g. FSI09PDF/FSI16PDF/FSI2011PDF. However, these are two independent proposals that do not actually conflict, and you might want to get the opinion of the LDM's proposers :-) Aharon On Tue, May 15, 2012 at 10:24 AM, Aharon (Vladimir) Lanin < aharon@google.com> wrote: I will reply substantively after taking www-style off the recipients. I don't think that the CSS list is the right place to discuss the details of the Unicode proposal. Aharon On Tue, May 15, 2012 at 10:10 AM, "Martin J. Dürst" < duerst@it.aoyama.ac.jp> wrote: On 2012/05/15 5:06, Aharon (Vladimir) Lanin wrote: Last week, I wrote up and Mark Davis submitted to the UTC a proposal ( http://goo.gl/K6qtV) for adding bidi isolation to Unicode. Here is the basic proposal: --- start quote --- Define three new Unicode formatting code points: LRI: marks the beginning of a left-to-right isolate. RLI: marks the beginning of a right-to-left isolate. FSI: marks the beginning of a first-strong isolate. Each would be matched with a PDF. It may be worth considering to create a new character to close these embeddings. Otherwise, older algorithms will close LRE/RLE/LRO/RLO embeddings/overrides prematurely. Another question: What's the relationship between this proposal and the new bidi control character that was proposed (I think by Apple) around last November's UTC? Regards, Martin.
Received on Tuesday, 15 May 2012 11:32:31 UTC