- From: <bugzilla@jessica.w3.org>
- Date: Tue, 19 Oct 2010 18:33:34 +0000
- To: public-i18n-bidi@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=10808 --- Comment #17 from Aryeh Gregor <Simetrical+w3cbug@gmail.com> 2010-10-19 18:33:34 UTC --- (In reply to comment #14) > The only use case given in this bug so far is the one in comment 3, which as > far as I can tell is the same as the use cases given in bug 10807. If there are > other use cases to consider here, such as the ones in comment 12, then please > describe them, ideally with URLs pointing to real Web pages showing those use > cases, so that I can study them. It's impossible to evaluate proposals without > concrete use cases. The use-cases are entirely different. Bug 10807 is about wanting isolation: when multiple logically distinct strings that might differ in direction are part of the same UBA paragraph, the UBA needs to be told that they're logically isolated so that part of one and part of another don't get mixed together into one run. E.g., Logical: my favorite hebrew letters are A, B, and C Correct visual: my favorite hebrew letters are A, B, and C Actual visual: my favorite hebrew letters are B, A, and C This bug has nothing to do with isolation. We're talking only about blocks here, and blocks are always isolated from one another. What we want here is some way to auto-detect the direction of a block. E.g., if there's a textarea where users might type in either English or Hebrew, then if the user starts typing in Hebrew, it should automatically switch to RTL so that the cursor doesn't jump around crazily as you type. But nor should it do that in English. (I encourage you to try this out. Go to data:text/html,<textarea dir=rtl></textarea> and type a few sentences in English. That's what you get when you try to type in Hebrew on any LTR site, i.e., practically any site. But this isn't just textareas, it also applies to any block content of unknown direction.) Here's my sketch of a proposal for fixing this. Add a new value for dir, dir=auto. This is logically equivalent to saying that the element doesn't have a known direction, and the direction should be determined automatically. In terms of CSS, it should translate to [dir=auto] { direction: auto; unicode-bidi: embed; }. The CSS "direction: auto" would be defined something like this. For each UBA paragraph, namely each "sequence of inline boxes uninterrupted by a forced line break or block boundary" (quote from CSS 2.1), if the containing block's computed value of direction is "auto", that paragraph has its direction determined heuristically. The heuristic might be as follows: 1) If the content is modifiable by the user, like <input> or <textarea>, decide direction based on the first strong-directionality character entered. 2) Otherwise, look at the first X Unicode code points, and if at least Y% are strong RTL, it's RTL; else, LTR. In practice, X might be infinity if that's okay with implementers, and Y probably something like 30. (X = infinity might cause jumping if the content is loaded incrementally, but in practice that's unlikely, as Aharon notes.) Note that if multiple UBA paragraphs are contained in a single dir=auto element, like with textarea or pre, they might have different direction. This is the same as if they started with an appropriate control character, so should be no big problem. As to whether this should be part of CSS or HTML -- if direction: rtl/ltr remains conforming, then so should this. If controlling directionality from CSS is really always a bad thing, then have CSS make the property non-conforming, and move the processing model to HTML. In the latter case, HTML might still define the property in terms of CSS, but specify that certain properties or values are to be ignored outside of UA stylesheets, or something like that. (In reply to comment #16) > 1. Estimating the direction of each UBA paragraph separately has a price. Namely? > 2. The use cases are limited to <textarea> and <pre>. True, if those are the only HTML elements that can contain multiple UBA paragraphs, but there's no reason not to specify that behavior across the board for simplicity. > Let's take a specific example: > > <div dir=auto> > some ltr text. > <div> > SOME RTL TEXT. > </div> > SOME MORE RTL TEXT. > </div> > > There are three UBA paragraphs here: the text before the internal div, the text > inside it, and the text after it. What you want is to have the first displayed > in LTR, and the others in RTL, and are puzzled why dir=auto is defined to give > them all the same direction (for autodirmethod values other than plaintext). In my proposal, both divs have a computed direction value of "auto", so all three UBA paragraphs are in a containing block whose computed direction value is "auto". Therefore the first will be LTR, the second RTL, the third RTL (leaving aside the question of what heuristic to use). IMO, this is the expected and correct behavior. > Now, the use cases. It is indeed possible to have multi-paragraph plain text > that can only be rendered well by assigning each of its UBA paragraphs its own > direction (as explicitly suggested by the UBA). However, such plain text is > limited to <textarea> and <pre> elements. <textarea> does not allow mark-up at > all, so the problem described above does not apply to it; <pre> is allowed to > contain some mark-up, but being pre-formatted, it is not expected to contain > the layout-modifying mark-up of the sort that bothers us. This is the use case > for autodirmethod=plaintext, which does per-paragraph estimation like you want, > but is not expected to handle well direction-dependent CSS within it. Why shouldn't it handle direction-dependent CSS within it well? > On the other hand, I do not see a use case for the dir=auto in the example > above to automatically apply independently to the internal div. If the author > wants auto-estimation on the internal div, let him put dir=auto on the internal > div. For example, if you are embedding a piece of complicated HTML that you did > not author in your page, and you do not know the direction in which this piece > of HTML is supposed to be displayed, put a <div dir=auto> around that piece of > HTML. If inside it there are smaller pieces that have a different direction, it > was the job of the HTML's original author to indicate this within the HTML, > e.g. with dir=auto elements around those smaller pieces. So are you saying that if I want all of my direction to be automatically determined, then I have to repeat dir=auto on every single block element instead of just specifying it once on html or body? That doesn't make sense at all to me. What I'd like to see is people putting dir=auto on the root elements of all their pages, so that everything magically works as expected in almost all cases (and you can explicitly override directionality in exceptions). Inserting HTML from an unknown source where the whole chunk must have the same directionality but the overall directionality is unknown is not at all an important use-case, IMO. When would this come up in practice? > The reason they exist is not to make it easier for the platform, but because > different approaches give better results for different kinds of content. Are authors better situated to figure out which is appropriate when, or browser implementers? I suspect the latter. Authors should not have to understand Unicode bidi to use dir=auto -- they should be able to slap it on their pages and have things work right across the board. Ideally this should be the platform default, in fact -- the only reason to do otherwise is legacy compatibility, if that. > First-strong has a serious flaw: RTL text very often contains LTR words and > phrases (e.g. acronyms and brand names) and even fairly often starts with them, > e.g. "html IS A WONDERFUL PLATFORM". I therefore tend to prefer any-rtl for > most cases. However, in an input box, first-strong does have the advantage of > being easier for the user to surmise and control. Thus, I would say, if you > have content you are obtaining via an input box, use first-strong (both on the > input box and the elements that are then used to display those values). But if > you are displaying text of unknown origin, any-rtl is a better bet. Why is first-strong better even on the element used to display the value? Why not use first-strong when the user inputs the text, but any-rtl (or some variant, maybe X% RTL in the first Y characters) when the text is subsequently displayed? Surely first-strong is very unlikely to produce more correct results than an any-rtl variant in practice, if the whole beginning of the contents is available. > BTW, flips are > also still possible but unlikely for first-strong, since the element could > start with an arbitrary amount of neutral content. True. > Better estimation algorithms can and will be invented. The reason we are > currently only dealing with first-strong, any-rtl, and plaintext is that they > are well-known, tried, and easily defined and implemented. If and when a much > better algorithm is invented and proven, we want to be able to support it. That > does not mean that existing content that was created with and works for an > older estimation method should be potentially broken by applying the new > estimation algorithm to it without being asked to do so. This is exactly why we > have autodirmethod. We can extend the repertory of its values without making > them the default for existing content. I don't think we need to worry about future-proofing much. We can always add new dir values at a future date, for example, or new attributes, or whatever, in the unlikely event that someone comes up with a brilliant new algorithm. However, I don't think authors should be asked to deal with the complexity of choosing different autodirmethods for different types of content, if we can do a good enough job heuristically. Does the heuristic I describe above sound like it would fail a significant amount of time in real-world content? > I tend to agree, but not everyone does. A discussion worth having, although it > would have been better if it had already taken place in public-i18n-bidi before > the bugs were filed on HTML5. I'd say the contrary, that it's better to have these things widely discussed as early as possible. i18n experts should come up with use-cases, and then they should work with web experts (browser implementers, spec editors, etc.) from day one on the solutions. i18n experts coming up with entire proposed solutions and only then presenting them to web experts will result in a lot of them getting shot down and rewritten from scratch, as has in fact happened on a number of these bugs. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. You reported the bug.
Received on Tuesday, 19 October 2010 18:33:43 UTC