comments on Unicode PRI #231 from Matitiahu Allouche on 2012-07-29 (public-i18n-bidi@w3.org from July to September 2012)

From: Matitiahu Allouche <matitiahu.allouche@gmail.com>
Date: Mon, 30 Jul 2012 00:54:58 +0300
To: <public-i18n-bidi@w3.org>
Message-ID: <315201cd6dd4$cd8b3880$68a1a980$@gmail.com>

The Unicode Technical Committee (UTC) has published a PRI (Public Review Issue) about a proposed Bidi Parenthesis Algorithm (see http://www.unicode.org/review/pri231/ ). I submit below the comments I have sent to UTC. Although the closing date for comments has gone by a few days ago, anybody with something to contribute should do so promptly, and the comments will probably be accepted since there has not been a lot of traffic about this PRI.

Regards, Mati

Ooops! I am past the closing date. I hope the comments below will be considered nevertheless.

1) In the table of section 3.5, the last line of the second column R(LR)R should be highlighted, since the UBA will resolve the open paren as LTR and the close paren as RTL.

2) In the same table, the sixth line of the fourth column L(LR)L should be highlighted, since the UBA will resolve the open paren as LTR and the close paren as RTL.

3) Same comment for the next line L(LR)R.

4) Line 115 mentions "the directionality of the enclosed content". It is not clear what this directionality is when the content includes mixed LTR and RTL text.

5) For completeness, rule N0 should specify what happens when the enclosed text is all N, even if to say that the BPA does not affect this case.

6) In section 5, example 6, I don't understand the result of the BPA. >From the UBA display, I understand that the LTR text (Microsoft Corp) is at the logical end of the string. In the BPA display it appears on the starting end of the string. I see nothing in the definition of the BPA which should give such a result.

7) Is a solution to the current problem of mismatched parenthesis desirable?

I am not sure, because of the following reasons:

a. The UBA is already quite complex. The BPA would add still more complexity. Proof is that, if my comments 1-3 and 6 above are founded, even the author of the proposal has missed some fine points. And if my comments are not founded, I am myself the one who got confused, despite the fact that I have more experience in bidi matters than the average person.

b. Consider a text editor implementing the UBA and BPA by transforming the logical text to visual display after each keystroke. When entering an open paren, the BPA will not kick in, since it is not paired. When entering the closing paren, the BPA will kick in, possibly modifying the display of text around the opening paren, which may be a few lines far from the typing location.

The UBA also has effects of modifying the appearance of text already entered, but it is always in the close neighborhood of the typing location.

8) Does the proposed solution meet expectations in terms of the naturalness of segmentation and directional flow of enclosed units?

The proposal assumes that opposite direction content within parentheses forms a unique directional run with opposite direction text on either side. When one of the sides has the embedding direction, I don't see that the context has opposite direction rather than embedded direction. In doubt, the BPA should not assume opposite direction for the parentheses.

Here is an example. The text in logical order (with upper case representing RTL letters) is

"I LIVE IN paris (france)."

Assuming a RTL paragraph direction, the UBA will display

".(paris (france NI EVIL I"

The BPA will display

".paris (france) NI EVIL I"

which is better. However, since the general direction of the text is RTL, I prefer to have it displayed as

".(france) paris NI EVIL I"

To get this result, rule N0 can be reformulated as follows:

N0. Paired punctuation marks take the opposite direction if the enclosed text contains no strong type of the embedding direction and the external neighbors on both sides have the opposite direction. Else the paired punctuation marks take the embedding direction.

9) Should the BPA be implemented as a new rule affecting the resolution of neutral types in the core UBA – proposed rule N0?

Should the BPA, rather, be a recommended implementation using higher level protocols?

As said above, I am not sure that the BPA has more benefits than problems, but if it is adopted, I think it should be in the core UBA. Leaving it for a higher level protocol introduces one more degree of uncertainty in the behavior of the presentation system, and this is not something we need.

10) Are stability concerns adequately addressed?

I think that in most cases, reasonable bidi text which looks good with the UBA will look the same with the BPA. However here is an exception. The logical text is (where ] represents RLE and ^ represents PDF):

I LIVE IN ]paris^(france).

The UBA will display it as:

.(france)paris^] NI EVIL I

The BPA, according to the proposed N0, will assign the opposite direction to the parentheses, thus displaying:

.paris^(france)] NI EVIL I

You can see that the order of "paris" and "france" is reversed. That would not happen with the revised N0 that I suggested in comment 8 above.

11) Are interoperability concerns during the migration period adequately addressed?

The document states: " The main stability concern therefore is that text authored using the BPA may display differently when rendered on a system which has not implemented the BPA. In such a case, the reader of that text is no worse off than they would have been prior to the development of the BPA."

This is not quite correct. Without the development of the BPA, the document author would have taken measures (like adding control characters) to create a correct display under the UBA. Authors writing on BPA-supporting systems are likely to create text which will be rendered differently on BPA-ignorant systems.

However, this is a problem which will always surface when new features are introduced.

Received on Sunday, 29 July 2012 21:55:35 UTC