Re: First strong on strings surrounded by isolate controls from Asmus Freytag (c) on 2016-09-15 (public-i18n-bidi@w3.org from July to September 2016)

From: Asmus Freytag (c) <asmusf@ix.netcom.com>
Date: Wed, 14 Sep 2016 18:13:40 -0700
To: r12a <ishida@w3.org>, public-i18n-bidi@w3.org
Cc: Roozbeh Pournader <roozbeh@google.com>, "Aharon (Vladimir) Lanin" <aharon@google.com>, Shervin Afshar <shervinafshar@gmail.com>, Mostafa Hajizadeh <mostafa@daftar.cc>
Message-ID: <520b2b11-e464-6d20-e243-44a32cf288da@ix.netcom.com>

The way I read this is that the rendering of the visible text (for a 
bare RLI/PDI without surrounding text) works as intended, whether or not 
the UBA resolves the outer (empty) paragraph as LTR or RTL.

I think that this is a limitation of the UBA. It is concerned with 
ordering the characters on a line, but not with laying out paragraphs 
(or pages).

So when it comes to CSS (or any other protocol) using the data to make 
decisions on paragraph or page layout, then that protocol may need to 
augment its rules to go beyond UBA.

Note that UAX#9 states in HL1: /Override P3 
<http://unicode.org/reports/tr9/#P3>, and set the paragraph embedding 
level.... "/A higher-level protocol may use an entirely different 
algorithm that heuristically auto-detects the paragraph embedding level 
based on the paragraph text and its context."

So, a conformant higher level protocol could be designed to detect the 
case discussed here and decide to base the implementation of paragraph 
layout (alignment) and page layout based on the type of isolate or even 
it's contents.

The problem with simply adding an RLM is that now you suddenly have an 
issue when you want to concatenate two strings (perhaps a bare LTR and a 
bare RTL isolate).

Taking the latter case, an (otherwise empty) paragraph containing 
isolates of either kind, say one LTR or one RTL, if the RLI was 
routinely "augmented" by prefixing it with an RLM, but the LRI was not 
(based on that P3 would resolve the paragraph to LTR anyway) then 
combining the two would *always* result in an RTL paragraph. If both 
types were augmented (a LRM added before a LRI), the first one in 
sequence would rule.

So, you might as well go ahead and not augment these, but stipulate that 
the higher level protocol you care about use a heuristic for treating 
paragraphs consisting only of isolates.

A./

On 9/14/2016 10:54 AM, r12a wrote:
> [moving this discussion to the list, with Roozbeh's agreement, and 
> reordering the previous posts so that all is chronologically oldest to 
> newest]
>
>
> > On Mon, Sep 12, 2016 at 11:17 AM, r12a <ishida@w3.org
> > <mailto:ishida@w3.org>> wrote:
> >
> >     hi Roozbeh,
> >
> >     i have a bidi question for you, if you don't mind.
> >
> >     the UBA says that the paragraph direction can be determined by
> >     looking for the first strong directional character, ignoring
> >     sequences of characters surrounded by isolating controls, and
> >     defaulting to LTR in the absence of any strong RTL character.
> >
> >     the alignment of a string when displayed tends to be derived from
> >     the paragraph direction, if i understand correctly.
> >
> >     so, what happens for a string such as
> >
> >     "RLI فعالیت بین‌المللی‌سازی، PDI"
> >
> >     which you'd expect to be displayed from the right side of the
> >     window, but for which no strong character would be detected by the
> >     algorithm? Is there something i'm missing that would look into the
> >     next level down if no strong character were detected in the highest
> >     level?
> >
> >     i expect that this would affect lots of strings passed around by
> >     scripts.
> >
> >     cheers,
> >     ri
>
>
>
> On 13/09/2016 18:53, Roozbeh Pournader wrote:
> > Hi Richard,
> >
> > Well, since there's no strong character visible to P2, that paragraph
> > will be resolved to LTR according to P3.
> >
> > This is intentional, as the isolates are supposed to be exactly that,
> > isolate the inside from the outside and the outside from the inside.
> >
> > If a script wants to make sure the string is displayed RTL, it should
> > add an RLM at the beginning.
>
>
> New contribution:
>
> Interesting.  This question arose while i was trying to clarify how 
> best to handle paragraph base direction for strings such as would be 
> encountered in JSON.  See 
> http://w3c.github.io/i18n-discuss/notes/json-bidi.html
>
> we're trying to find ways to keep the paragraph direction associated 
> with the string, so that the string is correctly displayed when used 
> in a web page or such.
>
> you'll see that we look at the possibility of wrapping strings in 
> RLI..PDI as one possible approach, but this means that actually the 
> consumer of the string would not know, in such a case, that the string 
> is RTL.
>
> If it keeps the control characters, it would display the contained 
> text correctly, but it would presumably be difficult to choose an 
> appropriate dir value if the preference was to use markup for 
> direction in the destination.  Also, if the direction of the string is 
> used to determine the alignment on the page (left or right), as i 
> think is the case for CSS, then the rendering application would not 
> get the right cue.
>
> ri
>
>
>

Received on Thursday, 15 September 2016 01:14:00 UTC