Re: Proposal for isolation characters in Unicode and the unicode-bidi:isolate and unicode-bidi:plaintext definitions

[-www-style]

I guess public-i18n-bidi is an ok place to discuss the Unicode proposal.
But would it not be better to do so on some Unicode list, at least in
addition to here?


> It may be worth considering to create a new character to close these
> embeddings. Otherwise, older algorithms will close LRE/RLE/LRO/RLO
> embeddings/overrides prematurely.


Good point.


> Another question: What's the relationship between this proposal and the
> new bidi control character that was proposed (I think by Apple) around last
> November's UTC?


I guess you are referring to http://www.unicode.org/review/pri205/ ("LEVEL
DIRECTION MARK (LDM) behaves like a direction mark which dynamically takes
on the resolved direction associated with the current embedding level")

Using the current Unicode feature set, the way to deal with an
opposite-direction inline insert is to declare its direction with LRE|RLE +
PDF around it (to ensure the correct ordering inside the insert),
immediately followed by an LRM when the embedding level around the phrase
is even (LTR) or an RLM when it is odd (RTL), to prevent a number or an
unrelated opposite-direction phrase following the insert from "sticking" to
it. The principal difficulty in implementing this is that often the code
layer doing the insertion has no idea what the embedding level at the
insertion point is. The LDM would address this need; IMO it is the most
important use case for it.

Under the new proposal, the way to deal with the opposite-direction phrases
is to put them in an isolate. LRM and RLM - and thus LDM - are not
necessary. Furthermore, this way to deal with opposite-direction inline
inserts is more robust, because it works even when the insert is surrounded
by a phrase whose direction is opposite to the embedding level, but whose
direction is not explicitly declared. Of course, it would be better if the
direction of every opposite-direction phrase were declared, but often that
is not the way that bidi text is constructed. In such cases, an LDM (or
LRM|RLM) disrupts the phrase surrounding the insert.

I believe that the use cases cited for the LDM can also be achieved with
isolates. For example, "An Arabic numeric date of the form dd/MM/yyyy in
which the fields should flow left-to-right (e.g. ٠٩/١٦/٢٠١١) in a
left-right context (i.e. the date and perhaps some other Arabic text are in
a mainly Latin-script paragraph), but should flow right-to-left (e.g
٢٠١١/١٦/٠٩) in a right-left context (e.g. a primarily Arabic-script
paragraph)" can be achieved by putting each of the numbers (day, month,
year) in an a separate isolate, e.g. FSI٠٩PDF/FSI١٦PDF/FSI٢٠١١PDF.

However, these are two independent proposals that do not actually conflict,
and you might want to get the opinion of the LDM's proposers :-)

Aharon

On Tue, May 15, 2012 at 10:24 AM, Aharon (Vladimir) Lanin <aharon@google.com
> wrote:

> I will reply substantively after taking www-style off the recipients. I
> don't think that the CSS list is the right place to discuss the details of
> the Unicode proposal.
>
> Aharon
>
>
> On Tue, May 15, 2012 at 10:10 AM, "Martin J. Dürst" <
> duerst@it.aoyama.ac.jp> wrote:
>
>> On 2012/05/15 5:06, Aharon (Vladimir) Lanin wrote:
>>
>>> Last week, I wrote up and Mark Davis submitted to the UTC a proposal (
>>> http://goo.gl/K6qtV) for adding bidi isolation to Unicode. Here is the
>>> basic proposal:
>>>
>>> --- start quote ---
>>> Define three new Unicode formatting code points:
>>> LRI: marks the beginning of a left-to-right isolate.
>>> RLI: marks the beginning of a right-to-left isolate.
>>> FSI: marks the beginning of a first-strong isolate.
>>>
>>> Each would be matched with a PDF.
>>>
>>
>> It may be worth considering to create a new character to close these
>> embeddings. Otherwise, older algorithms will close LRE/RLE/LRO/RLO
>> embeddings/overrides prematurely.
>>
>> Another question: What's the relationship between this proposal and the
>> new bidi control character that was proposed (I think by Apple) around last
>> November's UTC?
>>
>> Regards,   Martin.
>>
>
>

Received on Tuesday, 15 May 2012 09:09:19 UTC