RE: Proposal for isolation characters in Unicode and the unicode-bidi:isolate and unicode-bidi:plaintext definitions from Matitiahu Allouche on 2012-07-23 (www-style@w3.org from July 2012)

From: Matitiahu Allouche <matitiahu.allouche@gmail.com>
Date: Tue, 24 Jul 2012 00:13:27 +0300
To: "'Asmus Freytag'" <asmusf@ix.netcom.com>
Cc: "'Aharon $Vladimir$ Lanin'" <aharon@google.com>, "'Glenn Adams'" <glenn@skynav.com>, '"'Martin J. Dürst'"' <duerst@it.aoyama.ac.jp>, "'W3C style mailing list'" <www-style@w3.org>, <public-i18n-bidi@w3.org>
Message-ID: <207101cd6918$03901c00$0ab05400$@gmail.com>
I agree with Asmus that my proposed algorithm would not help neutralizing the external effects of the sequence [BDI]Having fun[PDF][PDI]

 

Consequently, I remove my proposal for a symmetric approach and vote for the option which gives maximal isolation of the content between LRI/RLI/FSI and PDI (which I think is option 2, isolates stronger than embeddings/overrides).



 

Shalom (Regards),  Mati

 

From: Asmus Freytag [mailto:asmusf@ix.netcom.com] 
Sent: Sunday, July 22, 2012 8:09 PM
To: Matitiahu Allouche
Cc: 'Aharon (Vladimir) Lanin'; 'Glenn Adams'; "'Martin J. Dürst'"; 'W3C style mailing list'; public-i18n-bidi@w3.org
Subject: Re: Proposal for isolation characters in Unicode and the unicode-bidi:isolate and unicode-bidi:plaintext definitions

 

If the desire for isolates is to allow the safe insertion of unrelated text, then the rules have to be such that no matter which characters the embedded text contains, it may not have any effect on the formatting of text around it.

The one exception would be unmatched BDI/PDI, Instead of a sticking a single BDI/PDI pair around the text to be inserted, the inserting implementation would have to either add balancing BDI or PDI characters in front / back to balance the insertion, or, depending on context, perhaps disallow (filter) such characters from inserted text. With either strategy, the insertion could be self-contained.

Filtering BDI/PDI from user input would seem a natural option when constructing a message from a template with short, user-specified, insertions, and is cheaper to implement than "rebalancing".

The symmetric approach would require the implementation to balance "classical" embeddings and overrides in insertions (filtering would be a much less desirable option, as we expect overrides, in particular, to be quite legitimate in those types of user input).

If not balanced, the sequence
    [BDI]Having fun[PDF][PDI]
would close any open overrides and thus affect the text following the insertion.

Matis algorithm below does not seem to help with this case.

A./

On 7/22/2012 9:20 AM, Matitiahu Allouche wrote:

Aharon Lanin wrote:

"…some user codes their name as "Having fun[RLO]", it will cause the rest of the paragraph in which this user's name appears to come out backwards. However, putting each user name in an isolate will prevent that - but only if option 2 is used. The symmetrical approach does not have that property."

The sequence is:    [BDI]Having fun[RLO][PDI] 

In option 3 (symmetric approach), the handler for PDI will detect that there is an unbalanced RLO and will close it.

The algorithm is: 

-        When encountering PDI, go back to the last BDI and close every open scope since the last BDI.

-        When encountering PDF, go back to the last LRE/RLE/LRO/RLO and close every open scope since that last formatting character.

In the last 2 sentences, "scope" includes embeddings, overrides and isolates.

 

I also very much favor keeping CSS in sync with Unicode formatting characters behavior, and vice versa. I think this can be achieved with option 3 no less than with option 2.

 

 

Shalom (Regards),  Mati

 

From: Aharon (Vladimir) Lanin [mailto:aharon@google.com] 
Sent: Saturday, July 21, 2012 10:10 PM
To: Matitiahu Allouche
Cc: Glenn Adams; Martin J. Dürst; W3C style mailing list; public-i18n-bidi@w3.org
Subject: Re: Proposal for isolation characters in Unicode and the unicode-bidi:isolate and unicode-bidi:plaintext definitions

 

The idea behind the choice of options 2 is that an isolate would protect its surroundings against extra or missing PDFs in it contents. For example, if I have a site that displays the name of a user, and some user codes their name as "Having fun[RLO]", it will cause the rest of the paragraph in which this user's name appears to come out backwards. However, putting each user name in an isolate will prevent that - but only if option 2 is used. The symmetrical approach does not have that property. Also note that the current specification of CSS isolates (separate bidi paragraphs) also has this property, so retaining it means that changing the CSS spec to use Unicode isolates will have fewer visible effects. In all honesty, this is the part that appeals to me most about option 2.

 

 

On Fri, Jul 20, 2012 at 7:21 PM, Matitiahu Allouche <matitiahu.allouche@gmail.com> wrote:

I am late joining this discussion, because I did not see really compelling arguments in favor of option 1 rather option 2 or vice versa.

Just to add to the fun, I want to suggest a third option: in the case of improperly  nested embeddings/overrides/isolates, both PDF *and* PDI will close all unmatched controls.

Going back to examples a and b:

a: RLI LRE PDI PDF

b: RLE LRI PDF PDI

 

In example a, the PDI will close the RLI and the LRE, PDF does nothing.

In example b, the PDF will close the LRI and the RLE, PDI does nothing.

If nothing else, this option has the merit of symmetry.

However, I am not in mad love with it, and I can live with either one of 1 or 2.

 

 

Shalom (Regards),  Mati

 

From: Glenn Adams [mailto:glenn@skynav.com] 
Sent: Monday, July 09, 2012 5:07 PM
To: Aharon (Vladimir) Lanin
Cc: Martin J. Dürst; W3C style mailing list; public-i18n-bidi@w3.org
Subject: Re: Proposal for isolation characters in Unicode and the unicode-bidi:isolate and unicode-bidi:plaintext definitions

 

 

On Mon, Jul 9, 2012 at 12:28 AM, Aharon (Vladimir) Lanin <aharon@google.com> wrote:

> I don't understand your logic. You say option 2 offers greater forward compatibility,

> but then say you are choosing 2 because forward compatibility is NOT important.

 

Not because it isn't important, but because in certain cases is LESS important than another consideration. It's a trade-off.

 

In other words, I think that well-formed documents, i.e. ones where isolates and embeddings/overrides are properly nested, should display as well as possible on systems that do not support isolates. That is why the proposal has been modified to include PDI. On the other hand, when it comes to essentially broken documents, where embeddings/overrides and isolates are not properly nested, I think it is more important to let isolates do their job and isolate the missing and extra PDFs in the apps that do support isolates than to make the document display as similarly as possible on old and new apps, when apps that don't understand isolates can't possibly display the document 100% as intended anyway.

 > I think backward compatibility is more desirable, i.e., a system that knows nothing of

> isolates should work without modification,

 By definition, it can't display the document 100% as intended. We introduce PDI is so its disability is limited to displaying isolates incorrectly (but then limit this to when isolates and embeddings/overrides are properly nested).

 > and yet option 2 requires PDI to close an embedding/override,

 Only when the isolate began before the embedding/override. If we have LRE RLI PDI PDF, the PDI only closes the isolate, not the embedding.

 That stills leaves that case where pre-PDI implementations would behave differently than PDI aware implementations, since the former would not close the embedding/override at the same position. I believe that may be a problem, and should be avoided.
Received on Monday, 23 July 2012 21:14:04 UTC