- From: Aharon (Vladimir) Lanin <aharon@google.com>
- Date: Wed, 7 Mar 2012 12:04:55 +0200
- To: Richard Ishida <ishida@w3.org>, public-i18n-bidi@w3.org
- Cc: Eric Muller <emuller@adobe.com>, Stephen Zilles <szilles@adobe.com>
- Message-ID: <CA+FsOYZXek8EucApmPUCup9xkbDAh1JOrS76W=cmULwcD_N0rA@mail.gmail.com>
[+public-i18n-bidi] On 06/03/2012 01:41, Eric Muller wrote: > >> I am looking at the W3C bidi test bidi-html5-019 [3], and it states: >> >> Assertion: 'A br element should separate paragraphs for the purposes >>> of the Unicode bidirectional algorithm.' 'If an inline element is >>> broken around a bidi paragraph boundary (e.g. if split by a block or >>> forced paragraph break), then the bidi control codes corresponding to >>> the end of the element are added before the interruption and the codes >>> corresponding to the start of the element are added after it. (In >>> other words, any embedding levels or overrides started by the element >>> are closed at the paragraph break and reopened on the other side of it.)' >>> >> >> The first part comes from HTML5[1], the second from CSS3 Writing Modes[2]. >> >> It seems to me that CSS imposes more than HTML does, specifically the >> "reopened" part. It's even worse: If I read only the HTML5 text (which >> says <br> is equivalent to a paragraph break), and the UAX#9 which more >> or less says that each paragraph is treated separately, without memory, >> I come to the conclusion that "reopening" should *not* happen. And I >> don't think that the HTML5 mention of CSS in that context >> >> This requirement may be implemented indirectly through the style >>> layer. For example, an HTML+CSS user agent could implement these >>> requirements by implementing the CSS 'unicode-bidi' property. [BIDI] >>> [CSS] >>> >> >> is meant to allow CSS to change the meaning of HTML documents. >> >> What am I missing? >> > HTML5 defers most bidi semantics to CSS and Unicode, mentioning only a few things here and there. CSS doesn't change the meaning of HTML documents - it is, in large part, the one giving them meaning as far as bidi is concerned (for the most part). For example, nowhere does the HTML5 spec say that the start and end tags of a <div> (or other non-phrasing-content element) are bidi paragraph separators. It is CSS Writing Modes and the HTML5 default style sheet that define that, along with most other things bidi. The HTML5 spec does say the following just before giving the default style sheet (http://dev.w3.org/html5/spec/Overview.html#introduction-8): User agents that do not honor author-level CSS style sheets are nonetheless expected to act as if they applied the CSS rules given in these sections in a manner consistent with this specification and the relevant CSS and Unicode specifications. [CSS] [UNICODE] [BIDI] Note: This is especially important for issues relating to the 'display', 'unicode-bidi', and 'direction' properties. By the way, that style sheet says the following for <br>: br { content: '\A'; white-space: pre; } This means that <br> is supposed to be treated the same as a newline in a <pre>, and that, according to the Unicode standard, means that it is a bidi paragraph break. The "reopening" that bothers you is not specific to <br>. It also happens for every display:block element, with respect to the content surrounding it. Take, for example, the following: <div dir=ltr> ==><span dir=rtl>‎--><b style="display:block">-->‎--></b>-->‎</span>==> </div> This displays as: ==><-- *<--<--* <--==> (In anything but IE, you can try it at as data:text/html,<div dir=ltr>==><span dir=rtl>‎--><b style="display:block">-->‎--></b>-->‎</span>==></div>) Please note that all the --> arrows are displayed in RTL, as <--, despite the LRM characters that would surround them if one ignored the <b style="display:block> and its closing </b>. (Note that the arrow in <div dir=rtl>‎-->‎</div> is displayed in LTR, as -->, because there the LRMs really do surround the arrow.) This proves that the <b style="display:block> and its closing </b> serve as bidi paragraph breaks, just like <br> does. Now, let's look at the last paragraph (the stuff following the </b>. It is displayed as "<--==>". That shows that it is LTR overall, following the dir=ltr on the div. However, it's leading arrow is displayed RTL, as <--, following the dir=rtl on the span that surrounds it -- despite the start of the span being in a separate paragraph. This can only happen by the span being "reopened" at the start of the third paragraph. Thinking a bit more about it, I have the intuition that HTML wants to say >> "<br> behave like a LINE SEPARATOR for the purpose of bidi". At which point >> the text in CSS does not apply. >> > That was indeed what the HTML4 spec said. This was deliberately changed in HTML5, after much discussion, because <br> is very widely used the same way as a newline in plain text - not as a line separator. For example, people intend something like: 1. I like א. <br> 2. ב is nice too. to be displayed as 1. I like א. 2. ב is nice too. not as 1. I like א. 2. ב is nice too. It was possible to make this change (after a lot of discussion) because this was something on which there was never interoperability: many browsers did not implement the HTML4 spec in this respect. >> I also suspect that what CSS says actually belongs to HTML; I don't think >> that the visual ordering should depend on using CSS, and should certainly >> not be different depending on whether the HTML is interpreted by an >> HTML-only or an HTML+CSS engine. > > I mostly addressed this above. The HTML5 spec explicitly defers to CSS on this, and also explicit demands that an HTML-only engine behave bidi-wise as if it implemented CSS and the default style sheet. It would be very difficult for HTML to define bidi behavior because that behavior has to depend on a number of CSS properties, e.g. display, unicode-bidi, direction, position and float. It would be possible for the HTML spec to define the bidi behavior under the assumption that the page does not use CSS, but the CSS spec would then have to go back and define it all again. Since the two specs would never be identical, the result would be built-in contradictions. This was, in fact, the case with the HTML4 spec. You may ask then why the HTML5 spec bothers to say that <br> is a bidi paragraph separator, when the default stylesheet defining it to be the same as a newline in <pre> would have been enough. The answer, I think, is that because the HTML4 spec said the opposite, the HTML5 spec editor wanted to stress the change. Nevertheless, I do think that it would be useful for the HTML spec to include a *non-binding* description of the bidi behavior under the assumption that the page does not use CSS, just so that the reader could have it all in one place. But my request that this be done (including even a partial draft of such a description) was refused by the HTML5 spec editor. > >> >> Thanks, >> Eric. >> >> >> [1] >> http://www.w3.org/TR/2011/WD-**html5-20110525/text-level-** >> semantics.html#the-br-element<http://www.w3.org/TR/2011/WD-html5-20110525/text-level-semantics.html#the-br-element> >> [2] http://dev.w3.org/csswg/css3-**writing-modes/#unicode-bidi<http://dev.w3.org/csswg/css3-writing-modes/#unicode-bidi> >> [3] >> http://www.w3.org/**International/tests/html-css/** >> generate?test=bidi-html5-019&**format=h5<http://www.w3.org/International/tests/html-css/generate?test=bidi-html5-019&format=h5> >> >
Received on Wednesday, 7 March 2012 10:05:50 UTC