My comments on Additional Requirements for Bidi in HTML (20100304 WD)

Dear Bidi specialists,

Here are my comments on Additional Requirements for Bidi in HTML 
(20100304 Working Draft).

My comments are based on my experience since being involved in designing 
the currently existing bidi provisions for both HTML (originally 
http://tools.ietf.org/html/rfc2070) and CSS, and on research on 
displaying structured documents and data in a bidi context (see 
http://www.sw.it.aoyama.ac.jp/2008/pub/IUC32-bidi/).

Before I start the actual comments, I want to the current effort is 
really worthwhile. When the current bidi solution for HTML was designed 
(~1995?), we indeed did not think about the details of form fillin or 
composing documents from text snippets e.g. from a database.

Overall comments:

I haven't commented on every editorial detail. I think most editorial 
details can be fixed later. However, I think we should make a strong 
effort to make sure that bidi-related terminology is correct and clean, 
because the HTML WG may just take the text from us without changes and 
it's awkward to later go back and tell them to fix it.

In the introduction, there should be a note saying that familiarity with 
bidi text display and with the Unicode Bidi Algorithm in particular is 
assumed.

Notation:
- "uppercase English" -> "uppercase Latin" (this is a script issue, not 
a language issue)
- "are stored in memory": This is usually called "logical order". Please 
use that term. If felt necessary, let's add explanations in parentheses.

End of document: Please provide a References section, as usual for W3C 
documents.

1.2 Base direction: "is displayed in RTL as" should read "is displayed 
with a base direction of RTL as"

1.3 Terminology:
- Some of the terms here, such as "computed direction", are clearly 
specific to this document, and should be kept. Others (e.g. base 
direction, LRE,...) are taken from other places and should either be 
removed or marked as such (explicitly pointing to their origin). That 
will help a reader skip stuff that's already known, and will help a 
potential HTML editor to know how to incorporate these in the relevant 
document.
- LRO: "arranges characters from left to right" -> "arranges characters 
strictly from left to right"
- UBA: "In HTML" -> "In current HTML" or "In HTML 4" or some such. 
Ideally, the document is written in such a way that it can be read 
without confusion in a few years.

Use of "RTL" in example text: Because this reads "LTR" when displayed, 
it may be very confusing to people not totally familiar with bidi 
issues. I suggest changing the example text to remove this pitfall.

The explanation of how the various proposals can be implemented based on 
the UBA should be worked out in more detail, to be complete, because 
that's what probably will be used for the definition of the features in 
the new HTML spec. So e.g. not "missing PDFs will be assumed at the 
close of an element" but "add missing PDFs at the end of an element".


2.1 bidi isolation of inlines

The proposal is basically a very good idea. Some details:
- The name of the new attribute, "bdi", needs more thought. It's too 
close to "bidi" and "bdo", and cryptic. "bidi-isolate" or something 
similar seems way more appropriate.
- In the work on displaying e.g. XML documents with bidi content, we 
bumped into the 'reverse' of this issue, namely how to make an inline 
element with explicit directionality behave as a single entity of that 
directionality. For that purpose, it's necessary to enclose the element 
in bidi marks (LRM or RLM) of the same directionality as the element 
itself (not of the directionality of the context as in the proposal). It 
may be possible that there are other uses for this in the wild, and that 
this could be added as an additional option for this attribute.
- There should be a clear specification about what happens when there's 
a 'bdi' attribute without a 'dir' attribute. (nothing? something?)
- I think there should be some text about deployment. This feature only 
makes sense if all major browsers implement it and it is deployed in a 
large percentage of the user base. There's no "backwards compatibility 
story", unfortunately.
- "except in special cases indicated in the sections below": Please put 
in pointers to the actual sections.


2.2 auto-direction
I personally think having two options, named 'word-count' and 
'first-strong', is best (apparently, both are currently in use, so they 
both must have some utility?). I don't think making this the default on 
some elements is a good idea; while that would be appropriate for a 
totally new design, it's not appropriate here because it would increase 
backwards-compatibility problems.


2.3 Reporting user direction choice for text input fields
- Looks like a good idea in general.
- I don't understand what's meant with "scrips are not available ... in 
e-mail forms"


2.4 Image flips
- I agree with another commenter (sorry, forgot the name) that this 
shouldn't be a general image mirroring feature. In its simplest, it 
could just be a binary property: bidi-mirror: yes/no (assuming the image 
source is LTR), but I understand the desire to also allow RTL source 
images. However, I'd personally take the attribute values the other way 
round, indicating the directionality of the original image, not the 
directionality context in which the image has to be mirrored.
- This may belong into CSS, not HTML. There should be some warnings 
about deployment (e.g. "for the time, don't use this unless the image is 
still understandable even when displayed the wrong way round").
- It may be worthwhile to give some though to vertical directionality, too.


3.1 <br> as a bidi separator
- Fixing all those pages that think <br> is a paragraph separator would 
be best!
- Browsers should definitely converge. As far as this is the job of the 
HTML WG, maybe we can just present two solutions (one like the current 
proposal, another closer to HTML 4, i.e. making bdi='no' the default for 
<br>, too) and have the HTML WG figure out which way browser makers are 
going to converge.
- With the current proposal, the fact that this ties in with the 'bdi' 
attribute is quite nice.
- The necessary changes to UTR#20 and UTR#13 are well noted. However, 
just replacing <xhtml:br/> by <xhtml:br bdi='no'/> is not the correct 
solution. These reports point to <br/> because they assume that this is 
a well-known, simple reference. With the potential changes, this is no 
longer the case, and the UTRs have to find other ways to describe line 
separators in a way that is easily and immediately understandable.


3.2 Newlines in <pre>,...
I clearly agree with this.


3.3 "embedded" block elements as bidi separators
- This seems reasonable. I think the only reason it's not spelled out in 
HTML 4 is that there was an assumption that it didn't allow for 
'free-floating' text besides block elements, either explicitly or by 
some SGML omittag trickery (but this assumption is clearly wrong for 
<div> within <div>).
- "Since inline elements are not allowed to contain block elements...": 
At least when including CSS, this is not true. There are inline blocks. 
Such blocks should not be treated as separators, but as single 
characters. This means that this feature has to depend on whether the 
outside element is a block element or not.


3.4 Script dialog text
I agree with the general direction this is going. However, it should be 
up to the HTML WG on how much of this behavior they what to prescribe, 
and at what level (MUST/SHOULD,...)


3.5 Title with dir=: agreed, but see 3.4


3.6 Direction of title and alt attributes: agreed, but see 3.4


3.7 <option>: agreed, but see 3.4


3.8 Set direction on <textarea>,...
- This is valuable advice
- In my understanding, what shortcuts to use for a particular 
functionality is not part of standardization, but part of browser 
differentiation. So we should leave it to the HTML WG whether they want 
to include this or not.


3.9 remember text directions: agreed, but see 3.4/3.8


3.10 bullets for lists: very strongly agree


3.11/3.12: scroll bars: agree, but see 3.4/3.8


Appendix: "less better" -> "worse"


Regards,    Martin.

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp

Received on Monday, 15 March 2010 10:58:10 UTC