[Bug 11829] parts of default stylesheet setting unicode-bidi and direction should be normative

http://www.w3.org/Bugs/Public/show_bug.cgi?id=11829

--- Comment #8 from Aharon Lanin <aharon.lists.lanin@gmail.com> 2011-05-05 08:10:02 UTC ---
(In reply to comment #7)

Thinking through all this again, I now see that the problem is even bigger than
I originally realized.

The excerpts quoted in comment 7 are taken from descriptions of the <bdo>
element and the <br> element. The comment also refers to specs on the <bdi>
element. *None of these* apply to the elements that used to be called block
elements in HTML4. *At no point does the current HTML spec define the role of
these elements in breaking the text of the document into paragraphs for the
purposes of the Unicode Bidirectional Algorithm.* The role of <br> (and
newlines in <textarea>) in this matter is now defined (in the sections quoted
above), but the role of a <div>, <p>, etc. is not.

Paragraph breaks have a crucial effect on bidi text, so the absence of a
complete spec for them in HTML is highly problematic.

Nor does the spec how these elements affect the bidirectional algorithm when
their "block" nature is modified by something outside HTML, e.g. a stylesheet.

In HTML4 there is an attempt at such a spec in
<http://www.w3.org/TR/html401/struct/dirlang.html#h-8.2>. Here are the most
relevant sections:

"The Unicode bidirectional algorithm requires a base text direction for text
blocks. To specify the base direction of a block-level element, set the
element's dir attribute."

"When an inline element that does not have a dir attribute is transformed to
the style of a block-level element by a style sheet, it inherits the dir
attribute from its closest parent block element to define the base direction of
the block."

"When a block element that does not have a dir attribute is transformed to the
style of an inline element by a style sheet, the resulting presentation should
be equivalent, in terms of bidirectional formatting, to the formatting obtained
by explicitly adding a dir attribute (assigned the inherited value) to the
transformed element."

This part of the HTML4 spec was far from perfect, but at least it was
something. As far as I know, it is completely missing in the current spec, with
the single exception of the unicode-bidi:isolate on the
what-used-to-be-called-block elements in the default stylesheet.

The bidi paragraph role of the blocks is currently only completely spelled out
in the CSS spec. Thus, in the CSS3 Writing Modes module, we have:

- User agents that support bidirectional text must apply the Unicode
bidirectional algorithm to every sequence of inline boxes uninterrupted by a
forced (bidi class B) paragraph break or block boundary. This sequence forms
the paragraph unit in the bidirectional algorithm.

- If [an inline element has unicode-bidi:embed, it] opens an additional level
of embedding with respect to the bidirectional algorithm. The direction of this
embedding level is given by the ‘direction’ property. Inside the element,
reordering is done implicitly. This corresponds to adding a LRE (U+202A), for
‘direction: ltr’, or RLE (U+202B), for ‘direction: rtl’, at the start of the
element and a PDF (U+202C) at the end of the element.

- If an inline element is broken around a bidi paragraph boundary (e.g. if
split by a block or forced paragraph break), then the bidi control codes
corresponding to the end of the element are added before the interruption and
the codes corresponding to the start of the element are added after it. (In
other words, any embedding levels or overrides started by the element are
closed at the paragraph break and reopened on the other side of it.)

I believe that the role of what-used-to-be-called-block elements in determining
the bidi paragraphs needs to be spelled out in the HTML5 spec, for the sake of
user agents that do not implement CSS, and for the sake of clarity.

By this role, I mean:

- Normally, a "block" element's boundaries form paragraph boundaries for the
purposes of the bidirectional algorithm. Thus, in <div>A<div>B</div>C</div>,
each of A, B, and C are, by default, separate UBA paragraphs. The base
direction of these paragraphs is specified by their respective containing
elements' directionality.

- When, by some extra-HTML mechanism (e.g. "display:inline" or
"display:inline-block" in a stylesheet), a "block" element ceases to act as a
block boundary between the content preceding and following it, user agents
should treat it for the purposes of the bidirectional algorithm exactly as
specified for the <bdi> element, except that its directionality must default to
that of its parent.

- When the dir attribute is present on an element that does not act as a block
boundary, it opens an additional level of embedding with respect to the
bidirectional algorithm. That is, for the purposes of the bidirectional
algorithm, the user agent must act as if there was a U+202A LEFT-TO-RIGHT
EMBEDDING character at the start of such an element with ltr directionality, a
U+202B RIGHT-TO-LEFT EMBEDDING character at the start of such an element with
rtl directionality, and a U+202C POP DIRECTIONAL FORMATTING at the end of such
an element.

- When an element that does not act as a block boundary is interrupted by a
bidi paragraph boundary (e.g. contains a "block" element or <br>), then the
bidi control codes, if any, corresponding to the end of the element are added
before the interruption and the codes, if any, corresponding to the start of
the element are added after after the interruption. (In other words, any
embedding levels or overrides started by the element are closed at the
paragraph break and reopened on the other side of it.)

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

Received on Thursday, 5 May 2011 08:10:05 UTC