[Bug 17814] New: it makes no sense to limit the placeholder attribute to values of the same direction as the <input>

https://www.w3.org/Bugs/Public/show_bug.cgi?id=17814

           Summary: it makes no sense to limit the placeholder attribute
                    to values of the same direction as the <input>
           Product: HTML WG
           Version: unspecified
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P3
         Component: other Hixie drafts (editor: Ian Hickson)
        AssignedTo: ian@hixie.ch
        ReportedBy: contributor@whatwg.org
         QAContact: contributor@whatwg.org
                CC: ian@hixie.ch, mike@w3.org, mounir.lamouri@gmail.com,
                    public-i18n-bidi@w3.org, aharon.lists.lanin@gmail.com,
                    lrosenth@adobe.com


This was was cloned from bug 15488 as part of operation convergence.
Originally filed: 2012-01-10 05:10:00 +0000
Original reporter: Aharon Lanin <aharon.lists.lanin@gmail.com>

================================================================================
 #0   Aharon Lanin                                    2012-01-10 05:10:18 +0000 
--------------------------------------------------------------------------------
Currently, the HTML spec
(http://dev.w3.org/html5/spec/Overview.html#text-rendered-in-native-user-interfaces)
states that "text from elements (either attribute values or the contents of
elements) is expected to be rendered in a manner that honors the directionality
of the element from which the text was obtained." While this is usually what
one wants, there are cases that it is does not suit the placeholder attribute.

For example, say that one has a Hebrew or Arabic page containing an <input
type="text"> intended for the user to enter a snippet of JavaScript. One would
then make it <input type="text" dir="ltr">, since JavaScript code is always
LTR. One might want a placeholder on the input, however, that would be in the
language of the rest of the page, and thus RTL. According to the spec, however,
it would be displayed in LTR, and thus garbled. Another example would be an
<input type="tel">, since telephone numbers are always LTR, but one would might
easily want an RTL placeholder on it.

We had a similar problem with the title attribute in
https://www.w3.org/Bugs/Public/show_bug.cgi?id=10818: the title's value
sometimes needs to have one direction while the element needs another. With the
title attribute, however, we at least had a workaround: wrap the element in a
span, move the title to the span, and set the dir attribute on both elements as
each needs. This does not work for placeholder because placeholder does not
work on a <span>.

Two possible solutions to this problem are:

1. define a placeholderdir attribute for <input>.

2. always display the placeholder as if it had dir=auto.

The second possibility is not perfect, but at least setting the placeholder
description explicitly is more easily done by prefixing it with an &lrm; or
&rlm; than by wrapping it in LRE|RLE and PDF characters.
================================================================================
 #1   Ian 'Hixie' Hickson                             2012-02-03 06:38:27 +0000 
--------------------------------------------------------------------------------
Why would it be garbled? dir=rtl isn't an override, just an embedding. The most
it would do is change the positioning of punctuation (easily fixed with
explicit embedding information) or the alignment.

I guess I don't mind if we make it dir=auto, but I really find this allergy to
Unicode bidi formatting characters to be getting out of control. This is an
extreme edge case, we shouldn't add an attribute for it.
================================================================================
 #2   Aharon Lanin                                    2012-02-05 11:11:58 +0000 
--------------------------------------------------------------------------------
> Why would it be garbled? [...] The most
> it would do is change the positioning of punctuation

Well, first of all, misplaced punctuation is already sufficiently annoying.
But it can certainly get worse. For example, let's say that I have a site named
"foo", with my own set of user accounts. The account names are limited to Latin
letters, numbers, periods, underscores, and dashes. I support user interfaces
in several languages, including RTL ones. Since "foo" is my brand, it remains
"foo" in all locales.

Where the English UI has an <input type="text" placeholder="your foo
username">, an RTL one has it as <input type="text" dir="ltr" placeholder="YOUR
foo USERNAME">. (I am using the convention of uppercase Latin for RTL
characters here to make this example intelligible to all readers.)

Why did I make it dir="ltr"? So that a username like john.doe will not go
through the stage of looking like ".john" while being typed, with the caret
jumping around and/or being displayed in strange places.

To be intelligible, my placeholder has to be displayed RTL, as:

EMANRESU foo RUOY

Instead, because of the dir=ltr, it is displayed in LTR, as

RUOY foo EMANRESU

This is as intelligible as "username OOF your" would be in English.

So, I try to fix it by making my input dir="auto". It does not help, since the
spec says that the value of any attribute has to be displayed in the element's
*directionality*. This is either "ltr" or "rtl", never "auto". And for an empty
<input> (which it has to be for the placeholder to be displayed), the dir=auto
evaluates to "ltr" directionality.

> This is an extreme edge case

Not at all. My guess is that a very significant percentage of inputs is for
types of data that has to be LTR, such as numeric data (e.g. phone number, age,
item count) and always-ltr text data like the username above. In a
well-designed RTL page, these should all be marked with dir=ltr. And once
dir=auto becomes available in more browsers, most of the rest should be marked
with dir=auto. In either case, the placeholder will be displayed LTR, and thus
will be garbled in an RTL page if it (besides containing some RTL words):
- starts with a number, or
- ends with punctuation, or
- contains an LTR word (e.g. a brand name)

> easily fixed with explicit embedding information

There is nothing easy about using LRE/RLE + PDF for the average human being. By
and large, users do not even know that they exist. They can not generate them
on their keyboards, and if they could, their invisibility makes it a challenge
to edit the placeholder later. And if they type them as entities, they wind up
becoming discombobulated. For example, here is what "&#8234;hello&#8236;" looks
like once I substitute actual RTL character for the "hello":

placeholder="&#8234;שלום&#8236;"

Having said all this and hopefully shown that the problem is real, I must admit
that I do not know of a solution that really makes me happy.
================================================================================

-- 
Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

Received on Wednesday, 18 July 2012 06:53:58 UTC