Fwd: bidi proposal

Richard Ishida asked me to resend my original comments on the draft to the
new public mailing list. Forwarded messages below.

Aharon, if you repost your reply, I will repost my follow-up as well.

Thanks,

~fantasai

-------- First Message --------
Subject: Re: A Proposal for HTML Improvements for Bidi: Please review
Date: Wed, 17 Feb 2010 13:46:46 -0800
From: fantasai

On 02/17/2010 12:47 PM, Richard Ishida wrote:
 > Dear i18n WG members,
 >
 > Aharon Lanin of Google has been preparing a proposal for additions to HTML that will
 > address practical issues when dealing with bidi text - particularly when handling
 > text that is inserted into a page from a database, etc.  He has had some feedback on
 > the text from bidi experts, and has prepared a new version of the proposal (however,
 > section 1.2 still needs some work - hopefully ready by the end of this week).
 >
 > This initial draft is available at http://www.w3.org/International/wiki/BidiProposal

 From a quick scan, it looks like a very thorough and intelligent discussion of
problems with the existing BiDi infrastructure in HTML and CSS. My only comment
at the moment is that a number of these problems should be solved at the CSS
level, in addition to (or in some cases, instead of) the HTML level.

The CSSWG has at least one open BiDi issue on CSS2.1. I'm happy to take back any
other changes we need to make to the CSS2.1 specs as a result of these discussions.
(At least two of the proposals require new features, such as additional values to
'direction' and 'unicode-bidi'. These might need to be filed for CSS3 Text Layout
instead.)

...

~fantasai



-------- Second Message --------
Subject: Re: Bidi proposal draft
Date: Wed, 24 Feb 2010 02:59:23 -0800
From: fantasai

On 02/22/2010 02:22 AM, Richard Ishida wrote:
 > Hi Aharon,
 >
 > I have migrated the bidi proposal wiki text to the format needed to
 > publish as a Working Draft.  There may be a couple of additional things
 > to do or mistakes, but hopefully most of the work is now done.
 >
 > See http://www.w3.org/International/docs/html-bidi-requirements/

====== Substantive comments ======


bdi=yes
# The element, even when empty, is to be displayed as if it were surrounded
# with strong-directional characters of the last explicit embedding level
# within which it appears.

Why not treat the element as U+FFFC instead? If the intent is isolation so
that it doesn't affect surrounding text, wouldn't treating it as a neutral
would make more sense than treating it as a strong character?

dir=auto
# Make simple direction estimation functionality available in the browser by
# allowing the dir attribute to take on new values indicating that the user
# agent is responsible for estimating the direction of the element's contents.
# One such dir attribute value would specify using the word-count algorithm,
# defined and discussed in Appendix A. Another would specify the first-strong
# algorithm, as defined by the UBA.

The Mozilla devs I talked to are skeptical that authors would know which
algorithm to choose. Also, scanning the entire text has performance
implications, especially for large elements and for when there are DOM
mutations involved. We suggest considering the following:
   - Of the first 64 characters after and including the first strong
     character, if any (or some low percentage) are strong RTL,
     consider the element's computed base direction to be RTL, else
     LTR.

# One possibility for such a specification would be with a new HTML attribute:
# hflip="no|yes|ltr|rtl".

Is there a use case for the 'yes' value?

# Just as with <br>, in Firefox and Opera, an embedded block element provides
# no bidi separation between the text preceding and following it, while IE and
# WebKit treat it as a UBA paragraph break. ... The text before and after a
# block element is said to form "anonymous blocks", and it is well accepted
# that blocks should constitute UBA paragraphs.

Yes, I would consider this a bug in Firefox and Opera: their behavior is
clearly violating the CSS spec.

A related problem is block elements that are rendered as display: inline.
The CSSWG has an issue filed on having these default to unicode-bidi: embed
in our sample HTML4 style sheet. (Looking at your document, they might also
need bdi=yes.)

====== Editorial comments ======

# HTML, the UBA is

s/HTML/In HTML/ ?

# This is because text displayed in the wrong direction is often garbled.

s/wrong direction/wrong base direction/

# and "MAKE html WORK FOR YOU" is displayed in LTR as
#
# EKAM html UOY ROF KROW
#
# instead of the intended
#
# UOY ROF KROW html EKAM

I suggest having the indented output be MAKE html WORK FOR YOU,
and rearranging the previous two strings as required. I think
that might get the point across a little better. :) But I suppose
the disadvantage is that the source order is no longer in
logical order.

In 2.1:

# The UBA's rendering of a piece of text depends not only on the
# explicitly declared direction in which it appears (e.g. the dir
# attribute value on the parent element)

s/explicitly declared direction/base directional context/ or somesuch
s/the dir/as set by the dir/

# The bidi formatting characters LRO, RLO, LRE, RLE, and PDF have
# particularly strong influence on what surrounds them.

This is somewhat overstated and vague. I would suggest something like

| The bidi formatting characters LRO, RLO, LRE, RLE, and PDF can
| fine-tune the bidi algorithm by either overriding the implicit
| directionality of characters (LRO, RLO) or creating an embedded
| base directional context (LRE, RLE)

# Most documents

s/Most/Many/ ?

# Arbitrary-direction entities also don't cause a problem when they
# are displayed as a separate block element (which is treated as a
# separate "paragraph" in UBA terms).

Append
   | and the base direction is correctly marked up with the HTML 'dir'
   | attribute.

# <span dir="rtl"> only explicitly states the direction

s/direction/base direction/

~fantasai

Received on Friday, 5 March 2010 21:22:08 UTC