Re: bidi and the initial current text position

Hi Alex.

Thanks for the reply.  Text is hard. ;)

Cameron McCormack:
> > > <style>
> > >   text { direction: ltr }
> > >   tspan { direction: rtl; unicode-bidi: bidi-override }
> > > </style>
> > > <text x="100"><tspan>AB</tspan> cd</text>
> > > 
> > > The visual order of this is “BA cd”. The <text> has
> > > text-anchor:start. Where is the text positioned?
…
> >  Gecko:        |BA cd  (where “|” is the vertical line at x = 100)
> >  IE:           |BA cd
> >  WebKit:  BA cd|
> >  Opera:   AB cd|
> >  Batik:        |cd BA

Alex Danilo:
> OK, so our result (just to complicate things:-) is:
> 
> Abbra: cd|BA
> 
> The visual order of Batik is close to correct IMO.

That’s surprising to me, although I still don’t understand all the
intricacies of bidi layout so it could well be correct.  Why isn’t
“BAcd” the correct visual order?  Does the direction:ltr on the <text>
not make this an overall LTR chunk of text with an RTL run at the start
of it?  If I changed the example to

  <text x="100">xy <tspan>AB</tspan> cd</text>

I would expect the visual order to be “xy BA cd”, so I am confused as to
why removing the “xy” should result in the “BA” going to the right side
of the “cd”.

> Existence of the <tspan> doesn't create a new text chunk, it's just
> defining the directionality isn't it? If so, you are ordering the
> string:
> 
> "AB cd" where "AB" is considered to be RTL, i.e. a UAX#9 embedding
> level of 1, whilst the " cd" has an embed level of 0. Running UAX#9
> will swap the "AB" as "BA" across to the right _to be read_ as the
> first string in the RTL line. The visual order can't start with BA,
> that's just plain broken.

Ah, so why is it an RTL line and not an LTR line?  Is there a heuristic
there based on the first logical character being RTL meaning that the
line as a while is considered RTL?  If so, does the direction:ltr not
override that?

> I don't see any prose in the spec. that says the existence of the
> <tspan> or the unicode-bidi:bidi-override etc. create a new text
> chunk. So I think the re-order should happen on the entire text chunk
> since the <tspan> does not introduce a new 'X' position or anything
> else that could be considered a chunk maker. They are 2 'runs' of
> text, but still one chunk I would have thought.

Yes I agree with that.  (A “chunk maker” sounds like a particularly
nasty combination of alcoholic beverages. ;))

> Now as for the space - it's in the LTR content " cd" and since the
> "AB" gets swapped across to the right side, the space leads the "cd"
> and so there should be no space after the "cd". Are you sure Batik
> stuck a space in there?

Yeah: http://mcc.id.au/temp/bps-batik.png

If I construct the equivalent HTML example (without the positioning, and
with a background colour on the RTL span):

  http://people.mozilla.org/~cmccormack/tests/bidi-simple.html

then I find that browsers uniformly render it as “BA cd”.

> As for "current text position", I think what we're doing is wrong
> here. From the text in the spec. I'd expect to see:
> 
> " cdBA|"
> 
> namely, that the first logical character (the start character) is
> placed to the left of the starting position.

OK.  I’ll wait to see your reasoning on the “cdBA” layout as opposed to
“BAcd”, but if you are right then that does make sense.  If “BAcd” is
the right layout (which is what I was assuming) then it’s trickier, and
my questions from my original mail about what that x="100" actually
means stands.

> Anyway - more data for you, but the interoperability is a mess,
> and the BIDI handling of the implementations more so. If that was
> real Arabic, it would be totally unreadable in 4 out of the 6
> implementations...

Thanks for looking into it.  I agree this is a bit of a mess, but on the
bright side, it gives us the opportunity to make changes for the better.

-- 
Cameron McCormack ≝ http://mcc.id.au/

Received on Wednesday, 18 May 2011 02:35:36 UTC