Bidi behavior of <br/> and LEVEL DIRECTION MARK

This is a thought experiment, and I am mostly trying to get the big 
picture. I may have some details wrong, please bear with me.


What has been bothering me from the start and motivated my first 
question is that <br/> is presented as a paragraph boundary for bidi, 
but then there is this reopening behavior that somewhat negates the 
boundary.

Let's try to look at it the other way, i.e. to understand <br/> as not 
being a paragraph boundary, but rather as something that has an effect 
on bidi, much like LRM/RLM. More precisely: suppose we can introduce a 
new bidi control character x, which is not a bidi paragraph boundary; 
what should be the characteristics of x such that <br/> can be defined, 
for bidi purposes, as having the same effect as x?

If we squint our eyes a little bit and ignore for one second the exact 
details of the reopening, <br/> is really not a boundary at all for the 
application of steps X1-X10 (explicit levels) of bidi because of the 
reopening. In other words, for the input

<div> ..A.. <br/> ..B.. </div>

whether you apply X1-X10 to "..A.." as one paragraph, and then to 
"..B.." as a separate paragraph *augmented* with all the reopening, you 
will get the same answer as applying X1-X10 to "..A.. ..B.." as a single 
paragraph. X1-X10 work only on the embedding and override characters, 
and the reopening is really reestablishing at beginning of processing 
..B.. the state of that processing after having processed ...A...  This 
carries over to ..A.. x ..B.., since x would not be an embedding or an 
override or PDF.

On steps W1-W7 (weak types): as two paragraphs, we really have something 
of the style ..A.. eor and sor ..B.. At least intuitively, we would get 
the same answer if we processed something like ..A.. x ..B.., where x 
has bidi class L or R, the one that matches the directionality of the 
paragraphs (which are the same by construction).

The same seems to work just as well for N1-N2 (neutral types).

For steps I1-I2, nothing is going to happen differently to ..A.. or ..B...

Finally, we get to the reordering. But since reordering is done within a 
line, and <br/> also has the effect of creating a line break, ..A.. and 
..B.. will be treated separately anyway. x does not really have an impact.

If we put together everything, it seems that the bidi effect of <br/> is 
simply that of a mark of the directionality of the paragraph.

---

Rings a bell? The proposed LEVEL DIRECTION MARK of Unicode Public Review 
Issue 205 [1]?

---

If the reasoning holds, I think there are interesting implications.

First, it would be much easier to describe the bidi effects of <br/> 
using something like LDM, than by the current wording.

Second, there is a certain attraction to being able to define the bidi 
effects of <br/> in a way that is similar to the definition of 
unicode-bidi:embed and unicode-bidi:override (i.e. by the transformation 
to characters).

Third, it could be prudent to change the current definition to include 
the bidi override and embedding characters in the reopening. One 
possible scenario is that CSS does not want to wait for the inclusion of 
LDM in Unicode, but would be compatible with that introduction.

Fourth, there is the synergy between the uses cases that motivated LDM 
and HTML/CSS. May be LDM would be good for HTML in general, not just for 
<br/>.

I think those implications warrant a closer look at the possibility of 
having something like LDM, and of defining <br/> accordingly.

Eric.

[1] http://www.unicode.org/review/pri205/

Received on Monday, 19 March 2012 23:28:16 UTC