[Bug 10808] i18n comment 2 : new dir attribute value: auto, and a new attribute: autodirmethod

http://www.w3.org/Bugs/Public/show_bug.cgi?id=10808

--- Comment #19 from Aharon Lanin <aharon.lists.lanin@gmail.com> 2010-10-21 12:14:19 UTC ---
(In reply to comment #17)
> The use-cases are entirely different.

In my opinion, they are not entirely different, but they are different. I will
send a list as a separate comment next.

> Bug 10807 is about wanting isolation [...]  E.g.,
> 
> Logical:        my favorite hebrew letters are A, B, and C
> Correct visual: my favorite hebrew letters are A, B, and C
> Actual visual:  my favorite hebrew letters are B, A, and C

Yes. Actually, it's even worse: my favorite hebrew letters are B ,A, and C

> This bug has nothing to do with isolation.  We're talking only about blocks
> here, and blocks are always isolated from one another.

No! It is very important to have dir=auto available for both block and inline
elements (or what used to be called block and inline elements). In fact, inline
cases are likely to be more common. As I said, I will send use cases.

> (I encourage you to try this out.  Go to data:text/html,<textarea
> dir=rtl></textarea> and type a few sentences in English. [...]

Excellent idea.

> Here's my sketch of a proposal for fixing this.  Add a new value for dir,
> dir=auto.  This is logically equivalent to saying that the element doesn't have
> a known direction, and the direction should be determined automatically.  In
> terms of CSS, it should translate to [dir=auto] { direction: auto;
> unicode-bidi: embed; }.

1. It is essential that the default unicode-bidi value for dir=auto be isolate,
for the sake of the inline elements.

2. The CSS experts have ruled out direction:auto, I believe for good reason. I
very much hope that one of them chimes in soon.

> The CSS "direction: auto" would be defined something like this.  For each UBA
> paragraph, namely each "sequence of inline boxes uninterrupted by a forced line
> break or block boundary" (quote from CSS 2.1), if the containing block's
> computed value of direction is "auto", that paragraph has its direction
> determined heuristically.  The heuristic might be as follows:
> 
> 1) If the content is modifiable by the user, like <input> or <textarea>, decide
> direction based on the first strong-directionality character entered.
> 
> 2) Otherwise, look at the first X Unicode code points, and if at least Y% are
> strong RTL, it's RTL; else, LTR.  In practice, X might be infinity if that's
> okay with implementers, and Y probably something like 30.  (X = infinity might
> cause jumping if the content is loaded incrementally, but in practice that's
> unlikely, as Aharon notes.)

- It is a bad idea to always use one algorithm for input or textarea, and
another everywhere else, since the text that the user types into an input or
textarea then usually has to be displayed in some other type of element on
another page. If the difference of algorithm causes a different direction to be
estimated, the text will be displayed differently then what looked good to the
user when he or she typed it, which is bad. Thus, the choice of algorithm has
to be left up to the page. The proposed autodirmethod attribute is the way to
do that.

- Your second algorithm is not unlike the character-count algorithm considered
in the full proposal
(http://www.w3.org/International/docs/html-bidi-requirements/#auto-direction,
search for "character count"). We did not propose supporting it at this time
because it needs more fine-tuning and evaluation than the time frame allows.
(For example, the Y value should actually depend on the scripts involved: a CJK
character carries more "weight" than a Hebrew or Arabic character, which
carries more "weight" than a Latin character.)

> (In reply to comment #16)
> > Let's take a specific example:
> > 
> > <div dir=auto>
> >   some ltr text.
> >   <div>
> >     SOME RTL TEXT.
> >   </div>
> >   SOME MORE RTL TEXT.
> > </div>
> > 
> > 1. Estimating the direction of each UBA paragraph separately has a price.
> 
> Namely?

The impact on direction-dependent CSS, as described before, i.e.:


> > First, note that if the first and third UBA paragraphs contained mark-up that
> > used the new CSS capabilities to depend on direction (e.g. text-align:start,
> > margin-end, :rtl in the selector, etc.), you would want it to depend on the UBA
> > paragraph's direction. However, the first and third UBA paragraphs are not
> > separate elements. They therefore must have the same CSS direction value. Thus,
> > having per-UBA-paragraph direction faces the unenviable choice of either
> > divorcing the direction-dependent CSS from the CSS direction to the
> > inaccessible UBA paragraph direction or having that CSS work inappropriately.
> > This choice is the price that I do not want to pay.

Let me explain in more detail: in the first paragraph, you want margin-start to
mean margin-left, while in the third paragraph, you want it to mean
margin-right. But what determines which it means is the CSS direction value: if
it's ltr, start is left, and if it's rtl, start is right. And since the first
and third paragraphs are in the same element, their CSS direction value has to
be the same. Thus, to get margin-start to mean different things in the two
paragraphs, you have re-define margin-start to work not off the element's CSS
direction, but off the current UBA paragraph's direction, which can not even be
exposed as a property of anything (the UBA paragraph does not correspond to an
element). This would be a huge and unwelcome change.

> > On the other hand, I do not see a use case for the dir=auto in the example
> > above to automatically apply independently to the internal div. If the author
> > wants auto-estimation on the internal div, let him put dir=auto on the internal
> > div. For example, if you are embedding a piece of complicated HTML that you did
> > not author in your page, and you do not know the direction in which this piece
> > of HTML is supposed to be displayed, put a <div dir=auto> around that piece of
> > HTML. If inside it there are smaller pieces that have a different direction, it
> > was the job of the HTML's original author to indicate this within the HTML,
> > e.g.  with dir=auto elements around those smaller pieces.
> 
> So are you saying that if I want all of my direction to be automatically
> determined, then I have to repeat dir=auto on every single block element
> instead of just specifying it once on html or body?

I would never recommend specifying dir=auto on html or body. I would only
recommend it on those elements containing a single-origin piece of content
whose overall direction one does not know. Such pieces of content would tend to
be quite small: a name, a description, a snippet, a comment, an address. 

> That doesn't make sense at
> all to me.  What I'd like to see is people putting dir=auto on the root
> elements of all their pages, so that everything magically works as expected in
> almost all cases (and you can explicitly override directionality in
> exceptions).

Magic indeed. You can try to spec such a feature, but I am 100% convinced that
its results would fall far short of expectations. One of the reasons for that
is that opposite-direction content runs into the problem of alignment: although
text is generally more readable start-aligned, start-aligning
opposite-direction blocks can break the visual layout of the page, making it
unsightly and hard to follow. Thus, one usually needs to make a judgement call
about each potentially-opposite direction box: does it work better
start-aligned to its own direction, or made to line up with the stuff around
it? The browser is not going to make that judgement call for you - and once you
are futzing around with the specific elements that can have opposite-dir
content, it's easy enough to put the dir=auto where it belongs.

Another reason: the direction switch often belongs not on the immediate parent
of the opposite-dir text, but on some ancestor which has no opposite-dir
content of its own. Which ancestor? Only the page designer knows.

The dir=auto we have proposed is intended for simple bits of potentially
opposite-direction content, not huge areas of complex, mixed-direction HTML. It
should be clearly documented as such.

> Inserting HTML from an unknown source where the whole chunk must have the same
> directionality but the overall directionality is unknown is not at all an
> important use-case, IMO.  When would this come up in practice?

I didn't say it's an unknown source, only that you did not author it. I am
talking about various mash-ups. I only brought it up because I thought that
that's what you are interested in. 

> 
> > The reason they exist is not to make it easier for the platform, but because
> > different approaches give better results for different kinds of content.
> 
> Are authors better situated to figure out which is appropriate when, or browser
> implementers?  I suspect the latter.

As I said above, a particular piece of content should always be estimated
consistently, e.g. both when being entered in an input and later being
displayed in a div or span. But different kinds of content - e.g. ads vs usre
comments - may work better with different estimation algorithms. The browser
can't tell the difference - only the author can.

> Authors should not have to understand
> Unicode bidi to use dir=auto -- they should be able to slap it on their pages
> and have things work right across the board.  Ideally this should be the
> platform default, in fact -- the only reason to do otherwise is legacy
> compatibility, if that.

We have different visions of what is practicable.

> > First-strong has a serious flaw: RTL text very often contains LTR words and
> > phrases (e.g. acronyms and brand names) and even fairly often starts with them,
> > e.g. "html IS A WONDERFUL PLATFORM". I therefore tend to prefer any-rtl for
> > most cases. However, in an input box, first-strong does have the advantage of
> > being easier for the user to surmise and control. Thus, I would say, if you
> > have content you are obtaining via an input box, use first-strong (both on the
> > input box and the elements that are then used to display those values). But if
> > you are  displaying text of unknown origin, any-rtl is a better bet.
> 
> Why is first-strong better even on the element used to display the value?  Why
> not use first-strong when the user inputs the text, but any-rtl (or some
> variant, maybe X% RTL in the first Y characters) when the text is subsequently
> displayed?

Because if the author typed in "hello SUSAN, how are things?" and had it come
out as intended, in LTR, i.e. as "hello NASUS, how are things?", we do not want
it later being displayed in RTL, i.e. "?how are things ,NASUS hello". It just
isn't readable that way.

> Surely first-strong is very unlikely to produce more correct
> results than an any-rtl variant in practice, if the whole beginning of the
> contents is available.

It is less likely, but not very unlikely. But the actual chances are
immaterial. 
 WYSIWYG is what's important.

> I don't think we need to worry about future-proofing much.  We can always add
> new dir values at a future date, for example, or new attributes, or whatever,

LOL. We are having such an easy time adding dir=auto now. 

> in the unlikely event that someone comes up with a brilliant new algorithm.

It is not at all unlikely.

> However, I don't think authors should be asked to deal with the complexity of
> choosing different autodirmethods for different types of content, if we can do
> a good enough job heuristically.  Does the heuristic I describe above sound
> like it would fail a significant amount of time in real-world content?

Yes, e.g. 'GREAT! credence clearwater revival SINGS it's been a hard day's
night!

But actually, real-world content is shaped by the platform. If what the user
wants to type in isn't coming out the way it should, the user changes it - or
sets the direction explicitly, if possible. The platform shapes real-world
content.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
You reported the bug.

Received on Thursday, 21 October 2010 12:14:27 UTC