W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > November 2010

[Bug 10809] i18n comment 3 : new attribute: submitdir

From: <bugzilla@jessica.w3.org>
Date: Thu, 04 Nov 2010 21:19:50 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1PE7Dq-00039N-F4@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=10809

--- Comment #39 from Aharon Lanin <aharon.lists.lanin@gmail.com> 2010-11-04 21:19:40 UTC ---
(In reply to comment #38)
> (In reply to comment #31)
> > 1. It is very easy for LRE, RLE, LRO, RLO, and PDF [...]
> > to get out of balance [...].
> 
> We should definitely make them affect the validity if it's a concern that
> people will use them incorrectly and could benefit from validator tools
> flagging these problems. Please file a bug suggesting this if you think it
> would help.

Will do.

> Anyway, I can see the appeal (in terms of simplicity) of out-of-band direction
> indication. I'll look into the feasability of just having a boolean attribute
> on <input> and <textarea> that results in a separate field in the submission.

Thank you.

If that is the bottom line, you can ignore my answers below to the stuff that
preceded this.

> > 2. Similarly, these characters, while being perfectly balanced on their own,
> > can very, very easily become "entangled" between the scopes of the document's
> > tags. For example, what exactly is the browser to make of <span dir=rtl> ...
> > [LRE] ... </span> ... [PDF]?
> 
> What should happen is defined by CSS, which defines all of the bidi formatting
> rules in terms of bidi formatting characters.

If so, the text between the </span> and the PDF will come out RTL, since the
</span> is equivalent to a PDF, which would be interpreted by the UBA to match
the LRE, thus closing it, and reverting to the RTL direction defined by the
<span dir=rtl>. How much sense does that make - the <span dir=rtl> was supposed
to end with the </span>, and the bidi formatting character was LRE, not RLE! If
one had equivalently entangled end tags of elements, e.g. <i>A<b>B</i>C</b>,
most browsers will attempt to display it the way the user intended it - with
the C bold, not italic. I am not saying your interpretation of what should
happen is bad, only that there is no good interpretation of this mess.

> > 3. Speaking of CSS, how exactly should the formatting characters - if
> > encouraged - interact with the direction-dependent CSS, e.g. text-align:start?
> > For example, consider:
> > 
> > [RLE]<div style="text-align:start">blah blah</div>[PDF]
> > 
> > Should the direction CSS property be rtl for the div? Should it be aligned to
> > the right? What if the [RLE] and [PDF] were inside the div?
> 
> The meaning of 'start' is entirely based on the 'direction' property and
> nothing else. This is all defined in the CSS spec.

I know. The point is that the formatting characters will not have any effect on
the CSS - and that effect is vital if you want things to work well. I am just
trying to demonstrate why in HTML you need to use mark-up (dir=), not the bidi
formatting characters.

> > However, the fact remains that in many cases,
> > opposite-direction text gathered from the user is best displayed aligned to its
> > start edge. So, to get that, I will still need to make the *div* say dir=rtl,
> > and not leave it up to the text inside the div.
> 
> Why not just use dir=auto? If the first character is a bidi formatting
> character, that'll work as intended, no?

1. Deciding it is RTL simply because the first character is RLE is definitely
wrong: consider "[RLE]JOE[PDF] likes to eat." It is an English sentence, LTR,
not RTL. In RTL, it would be displayed as ".likes to eat EOJ" instead of the
correct "EOJ likes to eat."

2. Unfortunately, the standard UBA algorithm (first-strong) ignores formatting
characters. We would have to twiddle with it a little to make it support them
(e.g. ignore the stuff inside them too, except for the case when the whole
string is wrapped in them, in which case return the direction they indicate).

> > And in order to do that after
> > the browser has stuck the formatting characters into text (because the user
> > entering it indicated its direction), the server side of my app will need to
> > parse the text in order to figure out that indeed it is wrapped in formatting
> > characters. And when I say "parse", I really mean parse: while the formatting
> > characters in "[RLE]BLAH blah BLAH[PDF]" might (!) have been inserted by the
> > mechanism you are proposing, the formatting characters in "[RLE]BLAH[PDF] blah
> > [RLE]BLAH[PDF]" definitely were not, and to understand that, the app will need
> > to scan right through the whole string.
> 
> How would a user ever end up submitting text in this latter state?

By pasting from some HTML page that uses bidi formatting characters :-)

> Why would you not use dir=rtl in this case anyway?

Let me make the example clearer with real text instead of blahs:

[RLE]JOE[PDF] intends to call [RLE]SUSAN[PDF]

This is an English sentence that happens to use some names in an RTL script. It
is thus LTR. It needs to be displayed as

EOJ intends to call NASUS

which will only happen if it is displayed LTR. In RTL, it will be displayed as

NASUS intends to call EOJ

which actually reverses the meaning.

> > As I said, I have no intention of using submitdir to support the use case of
> > the user indicating the direction of individual paragraphs inside a textarea
> > (as opposed to indicating the direction of all the paragraphs in a textarea at
> > once).
> 
> That seems a bit limited, but if it's really not something people want to do,
> fair enough.

I didn't say that people don't want it. I am saying that no one has figured out
a way to give it to them, even in a full-featured plain text editor.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Thursday, 4 November 2010 21:19:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 4 November 2010 21:19:58 GMT