RE: comments on Unicode PRI #231 from CE Whitehead on 2012-08-02 (public-i18n-bidi@w3.org from July to September 2012)

From: CE Whitehead <cewcathar@hotmail.com>
Date: Thu, 2 Aug 2012 15:16:52 -0400
To: <matitiahu.allouche@gmail.com>, <public-i18n-bidi@w3.org>, <unicode@unicode.org>
Message-ID: <BLU168-W2117743A33AF6AD245066AB3CB0@phx.gbl>
public-i18n-bidi@w3.org





Thanks, Mati, for cc-ing the list.

I too sent some comments on the bidi parentheses algorithm (which alas have yet to be posted at http://www.unicode.org/review/pri231/  though I sent mine in time; only my original idiotic comment has been posted; for my more recent comments, see below); I am concerned as to what happens when the opening text is ltr or rtl but the final text is neutral and both directionalities are present within the parenthetical embedding. I do think the algorithm should address this and the all neutral text to be clear.
I've commented on a few of your comments (I'm not a developer so I responded where I could as a user), and then have pasted my comments below yours for reference; however I think your solution regarding taking the directionality of the embedding in non-clear cases is better than my solution, so my comments are appended to the end of this email just for reference.

From: matitiahu.allouche@gmail.com
To: public-i18n-bidi@w3.org
Date: Mon, 30 Jul 2012 00:54:58 +0300
Subject: comments on Unicode PRI #231

> The Unicode Technical Committee (UTC) has published a PRI (Public Review Issue) about a proposed Bidi Parenthesis Algorithm (see 
> http://www.unicode.org/review/pri231/ ). I submit below the comments I have sent to UTC. Although the closing date for comments 
> has gone by a few days ago, anybody with something to contribute should do so promptly, and the comments will probably be accepted 
> since there has not been a lot of traffic about this PRI. > Regards,  Mati > <start of my comments> > Ooops! I am past the closing date. I hope the comments below will be considered nevertheless.
 > 4) Line 115 mentions "the directionality of the enclosed content". It is not clear what this directionality is when the content includes 
>mixed LTR and RTL text.I think this case should get a separate bullet here.
> 5) For completeness, rule N0 should specify what happens when the enclosed text is all N, even if to say that the BPA does not affect 
> this case.
Yes, agreed, I think so, too.
 > . . . 
7) Is a solution to the current problem of mismatched parenthesis desirable?I am not sure, because of the following reasons:> a.      The UBA is already quite complex. The BPA would add still more complexity. Proof is that, if my comments 1-3 and 6 above are 
> founded, even the author of the proposal has missed some fine points. And if my comments are not founded, I am myself the one who 
> got confused, despite the fact that I have more experience in bidi matters than the average person.Your comments above refer to the current algorithm, right, whereas the new algorithm would match these in all cases and thus reduce complexity, right?
> b.      Consider a text editor implementing the UBA and BPA by transforming the logical text to visual display after each keystroke. When 
> entering an open paren, the BPA will not kick in, since it is not paired. When entering the closing paren, the BPA will kick in, possibly 
> modifying the display of text around the opening paren, which may be a few lines far from the typing location.> The UBA also has effects of modifying the appearance of text already entered, but it is always in the close neighborhood of the typing 
> location.Hmm, this already happens when I type at say facebook (sorry to mention an example) in two directions; for example, if I type text in Arabic then an English definition or a Romanization, and then have another word in the definition separated by a punctuation mark such as a dash, text gets moved around before my eyes. I think thus that this is not a major argument against the revision of the algorithm, that is, that people are quite used to this and that having text display properly after a little typing is something bidirectional typists will appreciate (I know I do; I am learning when I can't have punctuation at facebook, all that, and when I can use it and it will ultimately display o.k. so I am assuming other people feel as I do.) 
> 8) Does the proposed solution meet expectations in terms of the naturalness of segmentation and directional flow of enclosed units?> The proposal assumes that opposite direction content within parentheses forms a unique directional run with opposite direction text on > either side. When one of the sides has the embedding direction, I don't see that the context has opposite direction rather than 
> embedded direction. In doubt, the BPA should not assume opposite direction for the parentheses.> Here is an example. The text in logical order (with upper case representing RTL letters) is >  "I LIVE IN paris (france)."> Assuming a RTL paragraph direction, the UBA will display >  ".(paris (france NI EVIL I"> The BPA will display>  ".paris (france) NI EVIL I"> which is better. However, since the general direction of the text is RTL, I prefer to have it displayed as >   ".(france) paris NI EVIL I"
Yes, I am in agreement here, though either solution is better than what we have now.
> To get  this result, rule N0 can be reformulated as follows:> N0. Paired punctuation marks take the opposite direction if the enclosed text contains no strong type of the embedding direction and the > external neighbors on both sides have the opposite direction. Else the paired punctuation marks take the embedding direction. Yes this might work; I had suggested that the text that preceded the opening parentheses logically  held sway but this might work better.In any case, your suggestion will solve the problem of cases where neutral text is contained within and also following the parentheses.
> . . .

Best, 

--C. E. Whitehead
cewcathar@hotmail.com 

* * * Appendix: My Comments * * *

Below are the comments I sent:
* * *
Hi. I realize that the bidi parenthes algorithm is not currently being 
discussed on the list, but wanted to cc the list with my feedback (I've 
already sent it to unicode (using the form), but I wanted to make 
"double sure" that my feedback gets to the right place; also I've made a
 few edits to the feedback I submitted; thus the comments here may be a 
little more clear.) 

1rst, apologies for mentioning happy and sad
 faces, including these four (: , :) , :( , ):  -- these will not be 
ordered as paired parentheses normally and would be ignored by the 
algorithm!!! This is fine I think.
On the same note, although I'm not completely sure about curly brackets { };
(please see discussion of single curly braces in legal documents:
http://www.oooforum.org/forum/viewtopic.phtml?t=53089 ),
in
 such cases the bidi parentheses rule will just not be implemented, so I
 believe that such uses are no problem again! A similar thing happens 
with closing parenthes which are used after numbers and letters --
1).
2.)
3.)
But again these should not normally be a problem.

Thus go ahead with the following four sets of braces:
 (), [], <>, and {}.
(I am not sure about other brackets and braces however -- in miscellaneous symbols and elsewhere).

2nd, 

** IMO, the algorithm should be part of Unicode's core, 
as Unicode core's previous way of handling braces should be improved/corrected in the core,
even though (IMO again) the current rules HL4 and HL5 are also fine,
and do enable applications to fix/tweak the basic bidi algorithm.
What
 I mean is that, since if an app ignores HL4 and HL5 and simply applies 
the Unicode bidi algorithm the result could be mismatched brackets (in 
some cases);further, the rules in the core should be fixed in the core 
(IMO again) so long as they are only fixed for marks that are universal,
 and not language specific. (I don't see a problem with backwards 
compatibility -- any workarounds should still work.)

** Also I agree it's best to locate the matching brackets before applying the rest of the bidi algorithm.

Third, 3.1. Some comments on the algorithm itself:
"If
 an open parenthesis is found, push it onto a stack and continue the 
scan. If a close parenthesis is 85 found, check if the stack is not 
empty and the close parenthesis is the other member of the mirrored pair
 for the character on the top of the stack. If so, pop the stack and 
continue the scan; else return failure. If the end of the paragraph is 
reached, return success if the stack is empty; else return failure. 
Success implies that all open and close parentheses, if any, in a 
paragraph are matched correctly. Failure implies that there are one or 
more mismatched paired punctuation marks in a run and therefore the 90 
handling under the parenthesis algorithm will not be attempted."
** The above is fine, IMO
"The
 rationale for following the embedding level in the normal case is that 
the text segment enclosed by 120 the paired punctuation marks will 
conform to the progression of other text segments in the writing 
direction. In the exception cases, the rationale to follow the opposite 
direction is based on context being established between the enclosed and
 adjacent segments with the same direction."
** Agreed, yes, embedding level should be followed in normal case, albeit for both brackets.
"Other
 neutral types adjacent to paired punctuation marks are resolved 
subsequent to resolving the paired punctuation marks themselves, and 
will therefore be influenced by that resolution."
** Agreed again, yes, so far so good.
"The
 directionality of the enclosed content is opposite the embedding 
direction, and at least one 115 neighbor has a bidi level opposite to 
the embedding direction O(O)E, E(O)O, or O(O)O."
"*N0. Paired 
punctuation marks take the embedding direction if the enclosed text 
contains a strong type of the same direction. Else, if the enclosed text
 contains a strong type of the opposite direction and at least one 
external neighbor also has that direction the paired punctuation marks 
take the direction opposite the embedding direction."
** I disagree with the above statements. 

R(R)L à R -- that is, with embedding ltr -- o.k.
L(R)R
 à R -- that is, with embedding ltr -- No; these brackets should take 
the ltr directionality, that is should take the directionality of the 
embedding level if same directionality immediately precedes opening 
paren (IMO again but see examples below).

** Same problem in an rtl embedding environment:
L(L)R
 à L with embedding rtl -- O.k. the text that precedes the parens is 
ltr, as is the text in parens, so fine, let the directionality be 
different from that of embedding directionality.
R(L)L à L with 
embedding rtl -- No; same problem as above in the ltr embedding 
environment! The directionality of the embedding is the same as the 
directionality of the text immediately preceding the parentheses. I 
think this sets the reader's expectation for the display of the parens!

So for example (note: as is your convention, upper case letters designate RTL characters/text and lower case designate ltr):
TEXT: AS-SAYYAD AL-ALIFBAYT (w3c lead, balad1), abc (w3c lead, country2).
O.k.,
 if the embedding directionality of this text is ltr, these parenthese 
can be displayed as ltr since there is enough ltr text both inside and 
outside of them.
However, if the embedding is rtl, the rtl text 
immediately preceding the parentheses makes me expect to see the 
parentheses displayed as rtl;
thus in this case should not the 
directionality of the embedding and preceding text determine the 
directionality of the whole, even for the second set of parentheses? 
(sorry for the commas in my example below of "proper" bidi layout; I 
can't find an rtl comma on my keyboard):

=> (country2 ,w3clead abc ,(balad1 ,w3clead) TYABFILA-LA DAYYAS-SA :TXET                                      
(**
 Thus, I think that, together with the directionality of the embedding 
level, the text that logically precedes the parens is critical to the 
determination of the directionality of the parens.)

3.2. Also, 
Sometimes there is no adjacent text on one side of a set of brackets, or
 on the other: although fortunately parentheses rarely begin a text 
block (except in programming, and in my writing), they often end a text 
block, followed by a neutral punctuation mark:
L(R)N
{ ? Have you addressed such cases in your algorithm? I must have missed something. }
* In an ltr text/embedding the above L(R)N should clearly be ltr. {And R(L)N should be rtl in an rtl text.} 
R(R)N 
* In an ltr text the above should run rtl nevertheless! {And L(L)N should run ltr in an rtl embedding!}
These two cases above seem to me to be two obvious cases!!

However, for the next two cases the solution is not so obvious to me:
L(R)N in an rtl context/embedding 
* (should it remain rtl? I am unsure. Probably.)
R(L)N in an ltr context/embedding
* (same question; should it remain ltr? Probably.)

The following cases I am again more sure of:
 L(RL)N in an rtl context/embedding
* I would make the text just above ltr
R(RL)N in an ltr context/embedding
* I would make the text just above rtl

> . . .

4.
 A note: Dictionary definitions, with phonetic transcriptions and 
examples of usage are one place where the bidi parentheses algorithm 
might apply a lot. Also language texts.
 
Best,
--C.E. Whitehead
cewcathar@hotmail.com
Received on Thursday, 2 August 2012 19:17:21 UTC