Re: Proposal for isolation characters in Unicode and the unicode-bidi:isolate and unicode-bidi:plaintext definitions from Aharon (Vladimir) Lanin on 2012-05-15 (public-i18n-bidi@w3.org from April to June 2012)

From: Aharon (Vladimir) Lanin <aharon@google.com>
Date: Tue, 15 May 2012 09:41:30 +0200
To: Asmus Freytag <asmusf@ix.netcom.com>
Cc: W3C style mailing list <www-style@w3.org>, public-i18n-bidi@w3.org
Message-ID: <CA+FsOYadwjzpMzHwiSG+CHbj+4C-9wJV5wSPGFD4GCPNv5oLsw@mail.gmail.com>
Hi Asmus,

I do not have a ready opinion of whether isolation should be added
to Unicode as a part of UBA-2 or as an incremental addition of a few more
characters. More importantly, I am not see how the UTC's choice between the
two affects (the reasons for) the proposed changes in the CSS spec.


> Implementing a new set of characters that creates layout features that can
> be mirrorred in CSS but not the other way creates yet another area that
> will need workarounds. So this half-solution doesn't really seem to buy
> anything, but any implementation that supports it would be as incompatible
> with existing implementations as for the more radical solution.


I am sorry, but I just have not been able to figure out what you mean in
this paragraph. I am not sure what exactly you mean by "features that can
be mirrored in CSS but not the other way", or workarounds for what in what,
or "the more radical solution".

Are arguing that the CSS feature as currently defined, where the isolate
constitutes a separate UBA paragraph or sequence of paragraphs, is the
better definition, and changing it such that paragraph breaks in the
isolate break the paragraph outside the isolate as well would be
undesirable? If so, why?

Aharon

On Tue, May 15, 2012 at 2:04 AM, Asmus Freytag <asmusf@ix.netcom.com> wrote:

>  I applaud the goal here.
>
> I think the isolates constitute a major departure for the Unicode Bidi
> algorithm anyway, so that it would be better for Unicode to own up to it
> and either explicitly version the algorithm (UBA-2) or otherwise mark the
> departure.
>
> If that were done, Unicode would have greater latitude to implement "less
> forward compatible" features.
>
> Why is that important? Implementing a new set of characters that creates
> layout features that can be mirrorred in CSS but not the other way creates
> yet another area that will need workarounds. So this half-solution doesn't
> really seem to buy anything, but any implementation that supports it would
> be as incompatible with existing implementations as for the more radical
> solution.
>
> Unicode views conformance in ways that gracefully adjust to adding
> characters to the standard. For bidi, this includes the stipulation that a
> conformant implementation only needs to support the behavior of characters
> it recognizes (as it can obviously not handle characters added in later
> versions of the standard).
>
> For bidi, this allows implementations to be conformant that do not support
> embeddings etc. by claiming that they don't "interpret" said characters.
>
> What would be more useful, would be UBA-2 which *required* full support of
> ALL special bidi characters (overrides, embeddings and the new isolates),
> not giving an implementation any "out" there. While still allowing
> implementations to not support specific right-to-left scripts, because,
> unlike formatting characters, their number and repertoire will likely
> increase.
>
> A./
>
>
>
> On 5/14/2012 1:06 PM, Aharon (Vladimir) Lanin wrote:
>
> Last week, I wrote up and Mark Davis submitted to the UTC a proposal (
> http://goo.gl/K6qtV) for adding bidi isolation to Unicode. Here is the
> basic proposal:
>
>   --- start quote ---
> Define three new Unicode formatting code points:
> LRI: marks the beginning of a left-to-right isolate.
> RLI: marks the beginning of a right-to-left isolate.
> FSI: marks the beginning of a first-strong isolate.
>
>  Each would be matched with a PDF. Obviously, isolates would be allowed
> to nest, just like embeddings. The visual ordering within an isolate is the
> same as if it used LRE or RLE, as appropriate, instead of the characters
> above. However, the visual ordering of the content outside the isolate and
> the placement of the isolate within it is the same as if the isolate were
> an ON-class character (e.g. U+FFFC).
>
>  Rules P2 and P3 for determining base direction (the first-strong
> algorithm) have to be modified to skip over the content in an isolate (i.e.
> everything between one of the new characters above and its matching PDF or
> end of paragraph, whichever comes first). This is part of making an isolate
> behave as a neutral character for the purposes of the visual ordering of
> the content surrounding it. This change to P2 and P3 would apply both when
> determining the base direction of a pragraph and when determining the
> embedding direction of an FSI isolate.
> --- end quote ---
>
>  This proposal is obviously inspired
> by unicode-bidi:isolate and unicode-bidi:plaintext. The purpose of adding
> similar features to Unicode is two-fold:
>
>    - These features are just as necessary in plain text as in HTML.
>    - Simplify things for browsers implementing unicode-bidi:isolate and
>    unicode-bidi:plaintext, which currently need to use complicated and
>    error-prone work-arounds.
>
>  Nevertheless, the proposal differs from the definitions of unicode-bidi:
> isolate and unicode-bidi:plaintext in an important
> way: unicode-bidi:isolate and unicode-bidi:plaintext allow paragraph breaks
> inside the isolate, and define the content inside the isolate as forming
> a sequence of paragraphs in this case. Thus:
>
>  1. A paragraph break does not end the isolate.
> 2. A paragraph break within the isolate does not constitute a paragraph
> break in the surrounding content, which still behaves as if the whole
> isolate were just a neutralcharacter.
> 3. An extra PDF within the isolate does not end it prematurely and the
> effects of a missing one do not extend beyond the isolate, or even beyond
> the paragraph missing the PDF.
>
>  This is all very different from the new characters as proposed, which
> are strictly inline (just like LRE and RLE). The differences mean that the
> CSS features as currently defined could not be implemented in terms of the
> proposed characters.
>
>  This is obviously as a major problem. For one thing, it would preclude
> implementing the CSS features as currently defined in terms of the new
> Unicode features.
>
>  So, why did I propose it this way? Why not change the Unicode proposal
> to allow the effects of the new isolating characters to span a paragraph
> break, as was defined in CSS?
>
>  Well, one reason is that this would be a radical departure from existing
> Unicode bidi algorithm rules and would have no chance of passing the
> Unicode committees. Nor would it resolve the difference regarding missing
> and extra PDFs. But, most importantly, I am not sure that having the
> paragraph outside the isolate continue uninterrupted around an isolate that
> contains a paragraph break really makes much sense. The continuation is too
> non-local to make much sense to the reader.
>
>  I think that it would be better instead to try to change the definition
> of unicode-bidi:isolate and unicode-bidi:plaintext so that it could be
> implemented in terms of the new Unicode characters, when and if they
> become available - but could still be implemented until then using existing
> techniques.
>
>  The change would be that a paragraph break within an inline isolate would
> also end the paragraph outside the isolate. However, the existing rule
> about reopening inline directionality embeddings after a paragraph break
> inside an inline element would then come into play, and both the direction
> outside the isolate and the isolate itself would then be reopened after
> the paragraph break. Thus, the paragraph break would end one isolate -
> but then open another one that will be closed by the isolate's end tag.
> For example:
>
>  <div dir=ltr>...<span dir=ltr>...<bdi
> dir=rtl>...<br>...</bdi>...</span>...</div>
> would be equivalent to:
> <div dir=ltr>...<span dir=ltr>...<bdi dir=rtl>...</bdi></span><br><span
> dir=ltr><bdi dir=rtl>...</bdi>...</span>...</div>
>
>  Isolates would not longer deal with missing/extra PDFs in CSS, but I
> believe that those can and should be handled in a different manner
> unrelated to isolates: having CSS remove extra PDFs and add missing ones
> automatically. But that is another story.
>
>  The rest of this message is a first stab at the changes in the Writing
> Modes and Text specs that would be necessary to do this. The good part is
> that I believe that the change makes both of these specs simpler.
>
>  In "Inline Direction and Bidirectionality", the paragraph starting with
> the words "Except when the ‘plaintext’ value of ‘unicode-bidi’ is in
> effect" would now go like this:
>
>  ------------
> In most cases, the paragraph embedding level is set according to the value
> of the ‘direction’ property of the paragraph's containing block. However,
> when the containing block has ‘unicode-bidi: plaintext’, the paragraph
> embedding level is set according to the the heuristic given in steps P2 and
> P3 of the Unicode algorithm.
>  ------------
>
>  Note that the last sentence of that paragraph (starting with 'The
> paragraph's element is usually the containing block') is no longer
> necessary.
>
>  The isolate definition would go like this:
>
>  ------------
> isolate
> If the element is inline, this value opens an additional level of
> embedding with respect to the bidirectional algorithm. The direction of
> this embedding level is given by the ‘direction’ property. However, for the
> purpose of bidi resolution of the content around it, this embedding must be
> treated as if it were an Object Replacement Character(U+FFFC). Inside the
> element, reordering is done implicitly.
>  ------------
>
>  The "Inside the element, reordering is done implicitly" sentence comes
> from the current definition of "embed". I don't actually understand what it
> means in either context, but copied it for your consideration.
>
>  When and if LRI and RLI make it into the Unicode spec, the following
> sentence should be added after "as if it were an Object Replacement
> Character (U+FFFC)":
>
>  ------------
> This corresponds to adding an LRI (U+????) for ‘direction: ltr’, or RLE
> (U+???) for ‘direction: rtl’, at the start of the element and a PDF
> (U+202C) at the end of the element.
>  ------------
>
>  The plaintext definition would go like this:
>
> ------------
> plaintext
> This value behaves as ‘isolate’ except that the direction of the
> embedding started when the element is inline, or the base direction of the
> paragraph started when the element isn't inline, is determined not by the
> element's computed ‘direction’ as usual, but by following the heuristic in
> rules P2 and P3 of the Unicode bidirectional algorithm on the content
> within the embedding or paragraph.
>  ------------
>
>  When and if FSI makes it into the Unicode spec, the following sentence
> should be added on:
>
>  ------------
> This corresponds to adding an FSI (U+????) at the start of an inline
> element and a PDF (U+202C) at its end.
>  ------------
>
>  The paragraph starting with "If an inline element is broken around a
> bidi paragraph boundary" should now go something like this:
>
>  ------------
> When an inline element contains a bidi paragraph boundary (e.g. a block or
> forced paragraph break), any embedding levels or overrides started by the
> element are closed at the paragraph boundary and reopened on the other
> side. When reopened, the embedding of a unicode-bidi:isolate or unicode-bidi:plaintext
> element is once again treated as if it were an Object Replacement
> Character (U+FFFC)  for the purpose of bidi resolution of the content
> around the embedding. The base direction of the embedding reopened for a
> unicode-bidi:plaintext element is determined by following the heuristic
> in rules P2 and P3 of the Unicode bidirectional algorithm on the content
> within the reopened embedding (which may differ from that of the embedding
> closed before the paragraph boundary).
>  ------------
>
>  In Text Level 3, the "Bidirectionality and Line Boxes" will need
> changing because the "immediately contained" definition is no longer
> necessary:
>
> ------------
> The start and end edges of a line box are determined by the inline base
> direction of the line box. In most cases, this is given by its containing
> block's computed ‘direction’. However if its containing block has ‘unicode-bidi:
> plaintext’ [CSS3-WRITING-MODES], the inline base direction of the line box
> must be determined by the base direction of the bidi paragraph for which
> the line box holds content.  An empty line box [and so on, with no further
> changes]
> ------------
>
> Aharon
> (Busily donning an anti-flame suit)
>
>
>
Received on Tuesday, 15 May 2012 07:42:27 UTC