Re: bidi and the initial current text position

Hi Cam,

 I feel your pain:-) Since no-one else seems to be hearing you
I took a bit of a look, so here's my 2c.

--Original Message--:
>Cameron McCormack:
>> I’m drafting a message on bidi text, but first I have a question.
>> Consider this:
>> 
>> <style>
>>   text { direction: ltr }
>>   tspan { direction: rtl; unicode-bidi: bidi-override }
>> </style>
>> <text x="100"><tspan>AB</tspan> cd</text>
>> 
>> The visual order of this is “BA cd”.  The <text> has text-anchor:start.
>> Where is the text positioned?
>
>Some followup observations.
>
>I’ve always had the expectation that if you have a single value in a
><text> element’s x="" attribute that, at least with text-anchor:start or
>text-anchor:end, it will give either the left or right edge of the whole
>run of text.  The definition of the text-anchor property certainly gives
>that impression:
>
>  start
>    The rendered characters are aligned such that the start of the
>    resulting rendered text is at the initial current text position. For
>    an element with a ‘direction’ property value of "ltr" (typical for
>    most European languages), the left side of the text is rendered at
>    the initial text position. For an element with a ‘direction’
>    property value of "rtl" (typical for Arabic and Hebrew), the right
>    side of the text is rendered at the initial text position.
>
>I have another expectation: that that single valued x="" attribute gives
>the actual position of the glyph for the first character (in document/
>logical order) of the <text> element.  Whether that position is the
>left or right edge of the glyph, and how that’s affected by text-anchor,
>I am not exactly sure.  This second expectation can’t possibly coexist
>with the first, because the first character in document order is the
>“A”, and the “A” must be rendered visually in the middle of the text
>string (between the “B” and the space).  So 100 can’t be both the
>position of the “A” glyph and the left (or right?) edge of the whole run
>of text.
>
>Now, this second expectation can’t actually be true, because text-anchor
>somehow affects the position of the glyphs.  If you have
>
>  <text x="10" text-anchor="end" direction="ltr">abc</text>
>
>then the position of the “a” glyph isn’t 10 – it’s the position of the
>“c” glyph that will be (right aligned) at 10.
>
>So I think the definition of the x="" attribute in the spec,
>
>  If a single <coordinate> is provided, then the value represents the
>  new absolute X coordinate for the current text position for rendering
>  the glyphs that correspond to the first character within this element
>  or any of its descendants.
>
>isn’t right.  text-anchor must somehow influence the position of the
>glyph such that the x="" value isn’t the “real” current text position
>that is used in the end.
>
>A full version of the example quoted from my first mail is
>http://people.mozilla.org/~cmccormack/tests/bidi-position-simple.svg
>doesn’t have interop:
>
>  Gecko:        |BA cd  (where “|” is the vertical line at x = 100)
>  IE:           |BA cd
>  WebKit:  BA cd|
>  Opera:   AB cd|
>  Batik:        |cd BA

OK, so our result (just to complicate things:-) is:

Abbra: cd|BA

The visual order of Batik is close to correct IMO.

Existence of the <tspan> doesn't create a new text chunk, it's just
defining the directionality isn't it? If so, you are ordering the string:

"AB cd" where "AB" is considered to be RTL, i.e. a UAX#9 embedding
level of 1, whilst the " cd" has an embed level of 0. Running UAX#9 will swap
the "AB" as "BA" across to the right _to be read_ as the first string in the RTL line. The
visual order can't start with BA, that's just plain broken.

I don't see any prose in the spec. that says the existence of the <tspan> or
the unicode-bidi:bidi-override etc. create a new text chunk. So I think
the re-order should happen on the entire text chunk since the <tspan>
does not introduce a new 'X' position or anything else that could be considered
a chunk maker. They are 2 'runs' of text, but still one chunk I would have thought.

Now as for the space - it's in the LTR content " cd" and since the "AB" gets swapped
across to the right side, the space leads the "cd" and so there should be no space
after the "cd". Are you sure Batik stuck a space in there?

As for "current text position", I think what we're doing is wrong here. From the text
in the spec. I'd expect to see:

" cdBA|"

namely, that the first logical character (the start character) is placed to the left of
the starting position.

Anyway - more data for you, but the interoperability is a mess, and the BIDI handling
of the implementations more so. If that was real Arabic, it would be totally unreadable
in 4 out of the 6 implementations...

Alex

>-- 
>Cameron McCormack ≝ http://mcc.id.au/
>
>

Received on Wednesday, 18 May 2011 01:11:36 UTC