[Bug 10809] i18n comment 3 : new attribute: submitdir from bugzilla@jessica.w3.org on 2010-11-03 (public-i18n-bidi@w3.org from October to December 2010)

From: <bugzilla@jessica.w3.org>
Date: Wed, 03 Nov 2010 14:30:46 +0000
To: public-i18n-bidi@w3.org
Message-Id: <E1PDeMQ-0004Qv-UM@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=10809

--- Comment #31 from Aharon Lanin <aharon.lists.lanin@gmail.com> 2010-11-03 14:30:46 UTC ---
(In reply to comment #30)
> (In reply to comment #25)
> > LRE, RLE, LRO,
> > RLO and PDF are evil in HTML for many reasons, but one of them is that it is
> > impossible to give a reasonable definition of how they should interact with
> > direction specified by mark-up
> 
> That's not true, since we in fact define everything in terms of these
> characters in CSS. It's not only possible, it's literally the only way it is
> done.

Just because C++ is implemented in terms of machine language commands does not
mean that programmers should be encouraged to insert snippets of machine code
into their C++ programs (even if the compiler does support that), or that an
IDE, when asked to "create getter/setter", should code up ones in assembler.

Here are just some reasons why these formatting characters are like machine
code, and should be highly discouraged:

1. It is very easy for these characters (where the PDF is the "closing
parenthesis" to the others) to get out of balance and become completely
nonsensical, e.g. [PDF][LRE]. Of course, the same can be said for HTML's
opening and closing tags, but if the tags are out of balance, the document is
invalid, and your authoring tools will help you prevent that from happening.
The formatting characters, being just text, do not affect the validity of the
document, and you are completely on your own.

2. Similarly, these characters, while being perfectly balanced on their own,
can very, very easily become "entangled" between the scopes of the document's
tags. For example, what exactly is the browser to make of <span dir=rtl> ...
[LRE] ... </span> ... [PDF]? Once again, the same can be said of HTML opening
and closing tags, but if you get those wrong, the document is invalid, but the
snippet above is perfectly valid HTML.

To see just how easily that can happen, consider the case where text containing
formatting characters is displayed by an app with added mark-up it adds, e.g.
to indicate the search hits in it.

3. Speaking of CSS, how exactly should the formatting characters - if
encouraged - interact with the direction-dependent CSS, e.g. text-align:start?
For example, consider:

[RLE]<div style="text-align:start">blah blah</div>[PDF]

Should the direction CSS property be rtl for the div? Should it be aligned to
the right? What if the [RLE] and [PDF] were inside the div?

It's pretty clear to me that (just as is the case today) the answer should be
"no" in all cases. However, the fact remains that in many cases,
opposite-direction text gathered from the user is best displayed aligned to its
start edge. So, to get that, I will still need to make the *div* say dir=rtl,
and not leave it up to the text inside the div. And in order to do that after
the browser has stuck the formatting characters into text (because the user
entering it indicated its direction), the server side of my app will need to
parse the text in order to figure out that indeed it is wrapped in formatting
characters. And when I say "parse", I really mean parse: while the formatting
characters in "[RLE]BLAH blah BLAH[PDF]" might (!) have been inserted by the
mechanism you are proposing, the formatting characters in "[RLE]BLAH[PDF] blah
[RLE]BLAH[PDF]" definitely were not, and to understand that, the app will need
to scan right through the whole string.

I think this use case also makes it clear that one really needs the direction
data out-of-band.

> It seems like the simplest solution here, especially considering <textarea>s
> and multiple paragraphs with different directionality, is to have an attribute
> that, if present, causes the user agent to include the relevant bidi formatting
> characters in the submission of the control's value. I don't really see how
> else we could do it... I mean, we could submit a second value that just had a
> list of character ranges labeled as ltr or rtl, but that would be even harder
> to manage, as far as I can tell (and easier to implement incorrectly � e.g. you
> could trick a site by sending overlapping ranges).

As I said, I have no intention of using submitdir to support the use case of
the user indicating the direction of individual paragraphs inside a textarea
(as opposed to indicating the direction of all the paragraphs in a textarea at
once). While there are some plain-text editors, e.g. gedit, that support
per-paragraph *direction auto-estimation* (and that is why we want
autodirmethod=plaintext), as far as I know none support per-paragraph *user
control* over directionality. That functionality has only been done in rich
text editors (including browser-based rich-text editors like TinyMCE). So, I
don't see why we have to try to figure out how we can make the puny textarea,
which is not even a full-featured plain text editor, do it.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
You reported the bug.
Received on Wednesday, 3 November 2010 14:30:48 UTC