Re: Bidi, HTML5 and CSS3, test bidi-html5-019

[+public-i18n-bidi]

On 06/03/2012 01:41, Eric Muller wrote:
>
>> I am looking at the W3C bidi test bidi-html5-019 [3], and it states:
>>
>>  Assertion: 'A br element should separate paragraphs for the purposes
>>> of the Unicode bidirectional algorithm.' 'If an inline element is
>>> broken around a bidi paragraph boundary (e.g. if split by a block or
>>> forced paragraph break), then the bidi control codes corresponding to
>>> the end of the element are added before the interruption and the codes
>>> corresponding to the start of the element are added after it. (In
>>> other words, any embedding levels or overrides started by the element
>>> are closed at the paragraph break and reopened on the other side of it.)'
>>>
>>
>> The first part comes from HTML5[1], the second from CSS3 Writing Modes[2].
>>
>> It seems to me that CSS imposes more than HTML does, specifically the
>> "reopened" part. It's even worse: If I read only the HTML5 text (which
>> says <br> is equivalent to a paragraph break), and the UAX#9 which more
>> or less says that each paragraph is treated separately, without memory,
>> I come to the conclusion that "reopening" should *not* happen. And I
>> don't think that the HTML5 mention of CSS in that context
>>
>>  This requirement may be implemented indirectly through the style
>>> layer. For example, an HTML+CSS user agent could implement these
>>> requirements by implementing the CSS 'unicode-bidi' property. [BIDI]
>>> [CSS]
>>>
>>
>> is meant to allow CSS to change the meaning of HTML documents.
>>
>> What am I missing?
>>
>
HTML5 defers most bidi semantics to CSS and Unicode, mentioning only a few
things here and there. CSS doesn't change the meaning of HTML documents -
it is, in large part, the one giving them meaning as far as bidi is
concerned (for the most part). For example, nowhere does the HTML5 spec say
that the start and end tags of a <div> (or other non-phrasing-content
element) are bidi paragraph separators. It is CSS Writing Modes and the
HTML5 default style sheet that define that, along with most other things
bidi. The HTML5 spec does say the following just before giving the default
style sheet (http://dev.w3.org/html5/spec/Overview.html#introduction-8):

User agents that do not honor author-level CSS style sheets are nonetheless
expected to act as if they applied the CSS rules given in these sections in
a manner consistent with this specification and the relevant CSS and
Unicode specifications. [CSS] [UNICODE] [BIDI]


Note: This is especially important for issues relating to the 'display',
'unicode-bidi', and 'direction' properties.


By the way, that style sheet says the following for <br>:

br { content: '\A'; white-space: pre; }

This means that <br> is supposed to be treated the same as a newline in a
<pre>, and that, according to the Unicode standard, means that it is a bidi
paragraph break.

The "reopening" that bothers you is not specific to <br>. It also happens
for every display:block element, with respect to the content surrounding
it. Take, for example, the following:

<div dir=ltr>
==><span dir=rtl>&lrm;--><b
style="display:block">-->&lrm;--></b>-->&lrm;</span>==>
</div>

This displays as:

==><--
*<--<--*
<--==>

(In anything but IE, you can try it at as data:text/html,<div
dir=ltr>==><span dir=rtl>&lrm;--><b
style="display:block">-->&lrm;--></b>-->&lrm;</span>==></div>)

Please note that all the --> arrows are displayed in RTL, as <--, despite
the LRM characters that would surround them if one ignored the <b
style="display:block> and its closing </b>. (Note that the arrow in <div
dir=rtl>&lrm;-->&lrm;</div> is displayed in LTR, as -->, because there the
LRMs really do surround the arrow.) This proves that the <b
style="display:block> and its closing </b> serve as bidi paragraph breaks,
just like <br> does.

Now, let's look at the last paragraph (the stuff following the </b>. It is
displayed as "<--==>". That shows that it is LTR overall, following the
dir=ltr on the div. However, it's leading arrow is displayed RTL, as <--,
following the dir=rtl on the span that surrounds it -- despite the start of
the span being in a separate paragraph. This can only happen by the span
being "reopened" at the start of the third paragraph.

Thinking a bit more about it, I have the intuition that HTML wants to say
>> "<br> behave like a LINE SEPARATOR for the purpose of bidi". At which point
>> the text in CSS does not apply.
>>
>
That was indeed what the HTML4 spec said. This was deliberately changed in
HTML5, after much discussion, because <br> is very widely used the same way
as a newline in plain text - not as a line separator. For example, people
intend something like:

1. I like &#x05D0;.
<br>
2. &#x05D1; is nice too.

to be displayed as

1. I like א‎.
2. ב is nice too.

not as

1. I like א.‏
‏2. ב is nice too.

It was possible to make this change (after a lot of discussion) because
this was something on which there was never interoperability: many browsers
did not implement the HTML4 spec in this respect.


>> I also suspect that what CSS says actually belongs to HTML; I don't think
>> that the visual ordering should depend on using CSS, and should certainly
>> not be different depending on whether the HTML is interpreted by an
>> HTML-only or an HTML+CSS engine.
>
>
I mostly addressed this above. The HTML5 spec explicitly defers to CSS on
this, and also explicit demands that an HTML-only engine behave bidi-wise
as if it implemented CSS and the default style sheet. It would be very
difficult for HTML to define bidi behavior because that behavior has to
depend on a number of CSS properties, e.g. display, unicode-bidi,
direction, position and float. It would be possible for the HTML spec to
define the bidi behavior under the assumption that the page does not use
CSS, but the CSS spec would then have to go back and define it all again.
Since the two specs would never be identical, the result would be built-in
contradictions. This was, in fact, the case with the HTML4 spec.

You may ask then why the HTML5 spec bothers to say that <br> is a bidi
paragraph separator, when the default stylesheet defining it to be the same
as a newline in <pre> would have been enough. The answer, I think, is that
because the HTML4 spec said the opposite, the HTML5 spec editor wanted to
stress the change.

Nevertheless, I do think that it would be useful for the HTML spec to
include a *non-binding* description of the bidi behavior under the
assumption that the page does not use CSS, just so that the reader could
have it all in one place. But my request that this be done (including even
a partial draft of such a description) was refused by the HTML5 spec editor.


>
>>
>> Thanks,
>> Eric.
>>
>>
>> [1]
>> http://www.w3.org/TR/2011/WD-**html5-20110525/text-level-**
>> semantics.html#the-br-element<http://www.w3.org/TR/2011/WD-html5-20110525/text-level-semantics.html#the-br-element>
>> [2] http://dev.w3.org/csswg/css3-**writing-modes/#unicode-bidi<http://dev.w3.org/csswg/css3-writing-modes/#unicode-bidi>
>> [3]
>> http://www.w3.org/**International/tests/html-css/**
>> generate?test=bidi-html5-019&**format=h5<http://www.w3.org/International/tests/html-css/generate?test=bidi-html5-019&format=h5>
>>
>

Received on Wednesday, 7 March 2012 10:05:50 UTC